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Abstract 

Here we present in a single essay a combination and completion of the 
several aspects of the problem of randomness of individual objects which of 
necessity occur scattered in our text . The reader can consult different 
arrangements of parts of the material in |^ . 
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1 Introduction 

Pierre-Simon Laplace (1749 — 1827) has pointed out the following reason why 
intuitively a regular outcome of a random event is unlikely. 

"We arrange in our thought all possible events in various classes; and 
we regard as extraordinary those classes which include a very small 
number. In the game of heads and tails, if head comes up a hundred 
times in a row then this appears to us extraordinary, because the 
almost infinite number of combinations that can arise in a hundred 
throws are divided in regular sequences, or those in which we ob- 
serve a rule that is easy to grasp, and in irregular sequences, that 
are incomparably more numerous". [P.S. Laplace, A Philosophical 
Essay on Probabilities,, Dover, 1952. Originally published in 1819. 
Translated from 6th French edition. Pages 16-17.] 

If by 'regularity' we mean that the complexity is significantly less than maximal, 
then the number of all regular events is small (because by simple counting the 
number of different objects of low complexity is small). Therefore, the event 
that anyone of them occurs has small probability (in the uniform distribution). 
Yet, the classical calculus of probabilities tells us that 100 heads are just as 
probable as any other sequence of heads and tails, even though our intuition 
tells us that it is less 'random' than some others. Listen to the redoubtable Dr. 
Samuel Johnson (1709—1784): 

"Dr. Beattie observed, as something remarkable which had hap- 
pened to him, that he chanced to see both the No. 1 and the No. 
1000, of the hackney-coaches, the first and the last; 'Why, Sir', said 
Johnson, 'there is an equal chance for one's seeing those two num- 
bers as any other two.' He was clearly right; yet the seeing of two 
extremes, each of which is in some degree more conspicuous than the 
rest, could not but strike one in a stronger manner than the sight 
of any other two numbers." [James Boswell (1740 — 1795), Life of 
Johnson, Oxford University Press, Oxford, UK, 1970. (Edited by 
R.W. Chapman, 1904 Oxford edition, as corrected by J.D. Fleeman, 
third edition. Originally published in 1791.) Pages 1319-1320.] 

Laplace distinguishes between the object itself and a cause of the object. 
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"The regular combinations occur more rarely only because they are 
less numerous. If we seek a cause wherever we perceive symmetry, it 
is not that we regard the symmetrical event as less possible than the 
others, but, since this event ought to be the effect of a regular cause 
or that of chance, the first of these suppositions is more probable 
than the second. On a table we see letters arranged in this order C 
onstantinople, and we judge that this arrangement 
is not the result of chance, not because it is less possible than others, 
for if this word were not employed in any language we would not 
suspect it came from any particular cause, but this word being in 
use among us, it is incomparably more probable that some person 
has thus arranged the aforesaid letters than that this arrangement 
is due to chance." [P.S. Laplace, Ibid.] 

Let us try to turn Laplace's argument into a formal one. First we introduce 
some notation. If a; is a finite binary sequence, then l{x) denotes the length 
(number of occurrences of binary digits) in x. For example, Z(OIO) = 3. 

1.1 Occam's Razor Revisited 

Suppose we observe a binary string x of length l[x) = n and want to know 
whether we must attribute the occurrence of x to pure chance or to a cause. 
To put things in a mathematical framework, we define chance to mean that the 
literal x is produced by independent tosses of a fair coin. More subtle is the 
interpretation of cause as meaning that the computer on our desk computes x 
from a program provided by independent tosses of a fair coin. The chance of 
generating x literally is about 2"". But the chance of generating x in the form 
of a short program x*, the cause from which our computer computes x, is at 
least 2~'(^ \ In other words, if x is regular, then l{x*) ^ n, and it is about 
2n-l{x') ^j]-Qgg more likely that x arose as the result of computation from some 
simple cause (like a short program x*) than literally by a random process. 

This approach will lead to an objective and absolute version of the classic 
maxim of William of Ockham (1290? - 1349?), known as Occam's razor: "if 
there are alternative explanations for a phenomenon, then, all other things being 
equal, we should select the simplest one" . One identifies 'simplicity of an object' 
with 'an object having a short effective description'. In other words, a priori 
wc consider objects with short descriptions more likely than objects with only 
long descriptions. That is, objects with low complexity have high probability 
while objects with high complexity have low probability. 

This principle is intimately related with problems in both probability theory 
and information theory. These probkuxis as outlined below can be interpreted 
as saying that the related disciplines are not 'tight' enough; they leave things 
unspecified which our intuition tells us should be dealt with. 



3 



1.2 Lacuna of Classical Probability Theory 

An adversary claims to have a true random coin and invites us to bet on the 
outcome. The coin produces a hundred heads in a row. We say that the coin 
cannot be fair. The adversary, however, appeals to probabity theory which says 
that each sequence of outcomes of a hundred coin flips is equally likely, 1/2^"°, 
and one sequence had to come up. 

Probability theory gives us no basis to challenge an outcome after it has 
happened. We could only exclude unfairness in advance by putting a penalty 
side-bet on an outcome of 100 heads. But what about 1010 . . .? What about 
an initial segment of the binary expansion of tt? 

Regular sequence 

Pr(OOOOOOOOOOOOOOOOOOOOOOOOOO) 

Regular sequence 

Pr(OlOOOllOllOOOOOlOlOOlllOOl) 

Random sequence 

Pr(lOOlOOllOllOOOlllOllOlOOOO) 

The first sequence is regular, but what is the distinction of the second se- 
quence and the third? The third sequence was generated by flipping a quarter. 
The second sequence is very regular: 0, 1,00,01, . . .. The third sequence will 
pass (pseudo-)randomness tests. 

In fact, classical probability theory cannot express the notion of randomness 
of an individual sequence. It can only express expectations of properties of 
outcomes of random processes, that is, the expectations of properties of the 
total set of sequences under some distribution. 

Only relatively recently, this problem has found a satisfactory resolution by 
combining notions of computability and statistics to express the complexity of 
a finite object. This complexity is the length of the shortest binary program 
from which the object can be effectively reconstructed. It may be called the 
algorithmic information content of the object. This quantity turns out to be 
an attribute of the object alone, and absolute (in the technical sense of being 
recursively invariant). It is the Kolmogorov complexity of the object. 

1.3 Lacuna of Information Theory 

In |2^, Claude Elwood Shannon (1916 — 2001) assigns a quantity of information 
or entropy to an ensemble of possible messages. All messages in the ensemble 
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being equally probable, this quantity is the number of bits needed to count all 
possibilities. 

This expresses the fact that each message in the ensemble can be communi- 
cated using this number of bits. However, it does not say anything about the 
number of bits needed to convey any individual message in the ensemble. To 
illustrate this, consider the ensemble consisting of all binary strings of length 
9999999999999999. 

By Shannon's measure, we require 9999999999999999 bits on the average 
to encode a string in such an ensemble. However, the string consisting of 
9999999999999999 1's can be encoded in about 55 bits by expressing 9999999999 
999999 in binary and adding the repeated pattern '1'. A requirement for this to 
work is that we have agreed on an algorithm that decodes the encoded string. 
We can compress the string still further when we note that 9999999999999999 
equals 3^ x 1111111111111111, and that 1111111111111111 consists of 2^ I's. 

Thus, we have discovered an interesting phenomenon: the description of 
some strings can be compressed considerably, provided they exhibit enough 
regularity. This observation, of course, is the basis of all systems to express very 
large numbers and was exploited early on by Archimedes (287BC — 212BC) in 
his treatise The Sand- Reckoner, in which he proposes a system to name very 
large numbers: 

"There are some. King Golon, who think that the number of sand 
is infinite in multitude [...or] that no number has been named which 
is great enough to exceed its multitude. [...] But I will try to show 
you, by geometrical proofs, which you will be able to follow, that, 
of the numbers named by me [...] some exceed not only the mass 
of sand equal in magnitude to the earth filled up in the way de- 
scribed, but also that of a mass equal in magnitude to the universe." 
[Archimedes, The Sand- Reckoner, pp. 420-429 in: The World of 
Mathematics, Vol. 1, J.R. Newman, Ed., Simon and Schuster, New 
York, 1956. Page 420.] 

However, if regularity is lacking, it becomes more cumbersome to express large 
numbers. For instance, it seems easier to compress the number 'one billion,' 
than the number 'one billion seven hundred thirty-five million two hundred 
sixty-eight thousand and three hundred ninety-four,' even though they are of 
the same order of magnitude. 

The above example shows that we need too many bits to transmit regular 
objects. The converse problem, too little bits, arises as well since Shannon's 
theory of information and communication deals with the specific technology 
problem of data transmission. That is, with the information that needs to be 
transmitted in order to select an object from a previously agreed upon set of 
alternatives; agreed upon by both the sender and the receiver of the message. 
If we have an ensemble consisting of the Odyssey and the sentence "let's go 
drink a beer" then we can transmit the Odyssey using only one bit. Yet Greeks 



5 



feel that Homer's book has more information contents. Our task is to widen 
the Hmited set of alternatives until it is universal. We aim at a notion of 
'absolute' information of individual objects, which is the information which by 
itself describes the object completely. 

Formulation of these considerations in an objective manner leads again to 
the notion of shortest programs and Kolmogorov complexity. 

2 Randomness as Unpredictability 

What is the proper definition of a random sequence, the 'lacima in probability 
theory' we have identified above? Let us consider how mathematicians test ran- 
domness of individual sequences. To measure randomness, criteria have been 
developed which certify this quality. Yet, in recognition that they do not mea- 
sure 'true' randomness, we call these criteria 'pseudo' randomness tests. For 
instance, statistical survey of initial segments of the sequence of decimal dig- 
its of TT have failed to disclose any significant deviations of randomness. But 
clearly, this sequence is so regular that it can be described by a simple program 
to compute it, and this program can be expressed in a few bits. 

"Any one who considers arithmetical methods of producing random 

digits is, of course, in a state of sin. For, as has been pointed out 
several times, there is no such thing as a random number — there are 
only methods to produce random numbers, and a strict arithmetical 
procedure is of course not such a method. (It is true that a problem 
we suspect of being solvable by random methods may be solvable by 
some rigorously defined sequence, but this is a deeper mathematical 
question than we can go into now.)" [.John Louis von Neumann 
(1903 — 1957), Various techniques used in connection with random 
digits, J. Res. Nat. Bur. Stand. Appl. Math. Series, 3(1951), pp. 
36-38. Page 36. Also, Collected Works, Vol. 1, A.H. Taub, Ed., 
Pergamon Press, Oxford, 1963, pp. 768-770. Page 768.] 

This fact prompts more sophisticated definitions of randomness. In his famous 
address to the International Congress of Mathematicians in 1900, David Hilbert 
(1862 — 1943) proposed twenty-three mathematical problems as a program to 
direct the mathematical efforts in the twentieth century. The 6th problem asks 
for "To treat (in the same manner as geometry) by means of ax;ioms, those 
physical sciences in which mathematics plays an important part; in the first 
rank are the theory of probability Thus, Hilbert views probability theory as 
a physical applied theory. This raises the question about the properties one can 
expect from typical outcomes of physical random sources, which a priori has 
no relation whatsoever with an ax;iomatic mathematical theory of probabilities. 
That is, a mathematical system has no direct relation with physical reality. To 
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obtain a mathematical system that is an appropriate model of physical phe- 
nomena one needs to identify and codify essential properties of the phenomena 
under consideration by empirical observations. 

Notably Richard von Mises (1883 — 1953) proposed notions that approach 
the very essence of true randomness of physical phenomena. This is related 
with the construction of a formal mathematical theory of probability, to form 
a basis for real applications, in the early part of this century. While von Mises' 
objective was to justify the applications to the real phenomena, Andrei Niko- 
laevitch Kolmogorov's (1903 — 1987) classic 1933 treatment constructs a purely 
axiomatic theory of probability on the basis of set theoretic axioms. 

"This theory was so successful, that the problem of finding the basis 
of real applications of the results of the mathematical theory of prob- 
ability became rather secondary to many investigators. ... [however] 
the basis for the applicability of the results of the mathematical the- 
ory of probability to real 'random phenomena' must depend in some 
form on the frequency concept of probability, the unavoidable nature 
of which has been established by von Mises in a spirited manner." 
[A.N. Kolmogorov, On tables of random numbers, Sankhya, Series 
A, 25(1963), 369-376. Page 369.] 

The point made is that the axioms of probability theory are designed so that 
abstract probabilities can be computed, but nothing is said about what prob- 
ability really means, or how the concept can be applied meaningfully to the 
actual world. Von Mises analyzc;(l this issue in detail, and suggested that a 
proper definition of probability depends on obtaining a proper definition of a 
random sequence. This makes him a 'frequentist' — a supporter of the frequency 
theory. 

The following interpretation and formulation of this theory is due to John 
Edensor Littlewood (1885 — 1977), The dilemma of probability theory, Little- 
wood's Miscellany, Revised Edition, B. Bollobas, Ed., Cambridge University 
Press, 1986, pp. 71-73. The frequency theory to interpret probability says, 
roughly, that if we perform an experiment many times, then the ratio of favor- 
able outcomes to the total number n of experiments will, with certainty, tend 
to a limit, p say, as n — > oo. This tells us something about the meaning of 
probability, namely, the measure of the positive outcomes is p. But suppose 
we throw a coin 1000 times and wish to know what to expect. Is 1000 enough 
for convergence to happen? The statement above does not say. So we have to 
add something about the rate of convergence. But we cannot assert a certainty 
about a particidar number of n throws, such as 'the proportion of heads will 
he p±e for large enough n (with e depending on n)'. We can at best say 'the 
proportion will lie between p± e with at least such and such probability (de- 
pending on e and no) whenever n > no'. But now we defined probability in an 
obviously circular fashion. 
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2.1 Von Mises' Collectives 



In 1919 von Mises proposed to eliminate the problem by simply dividing all infi- 
nite sequences into special random sequences (called collectives), having relative 
frequency limits, which arc the proper subject of the calculus of probabilities 
and other sequences. He postulates the existence of random sequences (thereby 
circumventing circularity) as certified by abundant empirical evidence, in the 
manner of physical laws and derives mathematical laws of probability as a con- 
sequence. In his view a naturally occurring sequence can be nonrandom or 
unlawful in the sense that it is not a proper collective. 

Von Mises views the theory of probabilities insofar as they are nu- 
merically rcprcscntable as a physical theory of definitely observ- 
able phenomena, repetitive or mass events, for instance, as found 
in games of chance, population statistics, Brownian motion. 'Prob- 
ability' is a primitive notion of the theory comparable to those of 
'energy' or 'mass' in other physical theories. 

Whereas energy or mass exist in fields or material objects, proba- 
bilities exist only in the similarly mathematical idealization of collec- 
tives (random sequences). All problems of the theory of probability 
consist of deriving, according to certain rules, new collectives from 
given ones and calculating the distributions of these new collectives. 
The exact formulation of the properties of the collectives is secondary 
and must be based on empirical evidence. These properties are the 
existence of a limiting relative frequency and randomness. 

The property of randomness is a generalization of the abundant 
experience in gambling houses, namely, the impossibility of a suc- 
cessful gambling system. Including this principle in the foundation 
of probability, von Mises argues, we proceed in the same way as the 
physicists did in the case of the energy principle. Here too, the ex- 
perience of hunters of fortune is complemented by solid experience 
of insurance companies and so forth. 

A fundamentally different approach is to justify a posteriori the 
application of a purely mathematically constructed theory of prob- 
ability, such as the theory resulting from the Kolmogorov axioms. 
Suppose, we can show that the appropriately defined random se- 
quences form a set of measiirc one, and without exception satisfy 
all laws of a given axiomatic theory of probability. Then it appears 
practically justifiable to assume that as a result of an (infinite) ex- 
periment only random sequences appear. 

Von Mises' notion of infinite random sequence of O's and I's (collective) essen- 
tially appeals to the idea that no gambler, making a fixed number of wagers of 
'heads', at fixed odds [say p versus 1—p] and in fixed amounts, on the flips of a 
coin [with bias p versus 1—p], can have profit in the long run from betting ac- 
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cording to a system instead of betting at random. Says Alonzo Church (1903 — 
): "this definition [below] ... while clear as to general intent, is too inexact in 
form to serve satisfactorily as the basis of a mathematical theory." [A. Church, 
On the concept of a random sequence. Bull. Amer. Math. Soc, 46(1940), pp. 
130-135. Page 130.] 

Definition 1 An infinite sequence Oi, 02, . . . of O's and I's is a random sequence 
in the special meaning of collective if the following two conditions are satisfied. 

1. Let /„ is the number of I's among the first n terms of the sequence. Then 

T fn 

hm — =p, 

n— *oo n 

for some p, < p < 1. 

2. A place- selection rule is a partial function (j), from the finite binary se- 
quences to and 1. It takes the values and 1, for the purpose of select- 
ing one after the other those indices n for which (/'(aia2 . . . a„_i) = 1. We 
require (1), with the same limit p, also for every infinite subsequence 

^ni ^712 ' • * 

obtained from the sequence by some admissible place-selection rule. (We 
have not yet formally stated which place-selection rules are admissible.) 

The existence of a relative frequency limit is a strong assumption. Empiri- 
cal evidence from long runs of dice throws, in gambling houses, or with death 
statistics in insurance mathematics, suggests that the relative frequencies are 
apparently convergent. But clearly, no empirical evidence can be given for the 
existence of a definite limit for the relative frequency. However long the test 
run, in practice it will always be finite, and whatever the apparent behavior in 
the observed initial segment of the run, it is always possible that the relative 
frequencies keep oscillating forever if we continue. 

The second condition ensures that no strategy using an admissible place- 
selection rule can select a subsequence which allows different odds for gambling 
than a subsequence which is selected by flipping a fair coin. For example, let 
a casino use a coin with probability p = 1/4 of coming up heads and pay-off 
heads equal 4 times pay-off tails. This 'Law of Excluded Gambling Strategy' 
says that a gambler betting in fixed amounts cannot make more profit in the 
long run betting according to a system than from betting at random. 

"In everyday language we call random those phenomena where wc 
cannot find a regularity allowing us to predict precisely their re- 
sults. Generally speaking, there is no ground to believe that ran- 
dom phenomena should possess any definite probability. Therefore, 
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we should distinguish between randomness proper (as absence of any 
regularity) and stochastic randomness (which is the subject of prob- 
ability theory). There emerges the problem of finding reasons for 
the applicability of the mathematical theory of probability to the 
real world." [A.N. Kolmogorov, On logical foundations of probabil- 
ity theory, Probability Theory and Mathematical Statistics, Lecture 
Notes in Mathematics, Vol. 1021, K. Ito and J.V. Prokhorov, Eds., 
Springer- Verlag, Heidelberg, 1983, pp. 1-5. Page 1.] 

Intuitively, we can distinguish between sequences that are irregular and do not 
satisfy the regularity implicit in stochastic randomness, and sequences that are 
irregular but do satisfy the regularities associated with stochastic randomness. 
Formally, we will distinguish the second type from the first type by whether or 
not a certain complexity measure of the initial segments goes to a definite limit. 
The complexity measure referred to is the length of the shortest description of 
the prefix (in the precise sense of Kolmogorov complexity) divided by its length. 
It will turn out that almost all infinite strings are irregular of the second type 
and satisfy all regularities of stochastic randomness. 

"In applying probability theory we do not confine ourselves to negat- 
ing regularity, but from the hypothesis of randomness of the ob- 
served phenomena we draw definite positive conclusions." [A.N. Kol- 
mogorov, Combinatorial foundations of information theory and the 
calculus of probabilities, Russian Mathematical Surveys,, 38:4(1983), 
pp. 29-40. Page 34.] 

Considering the sequence as fair coin tosses with p = 1/2, the second condition 
in Definition |l| says there is no strategy (j) {principle of excluded gambling system) 
which assures a player betting at fixed odds and in fixed amounts, on the tosses 
of the coin, to make infinite gain. That is, no advantage is gained in the long 
run by following some system, such as betting 'head' after each run of seven 
consecutive tails, or (more plausibly) by placing the nth. bet 'head' after the 
appearance of n + 7 tails in succession. According to von Mises, the above 
conditions are sufiiciently familiar and a uncontroverted empirical generalization 
to serve as the basis of an applicable calculus of probabilities. 

Example 1 It turns out that the naive mathematical approach to a concrete 
formulation, admitting simply all partial functions, comes to grief as follows. 
Let a = aia2 ... be any collective. Define 4>i as 4>i{ai . . . ai_i) = 1 if = 1, 
and undefined otherwise. But then p = I. Defining (f>Q by (fioiO'i ■ ■ ■ Q-i-i) = bi, 
with bi the complement of at, for all i, we obtain by the second condition of 
Definition that p ~ 0. Consequently, if we allow functions like 0i and (j>Q as 
strategy, then von Miscs' definition cannot be satisfied at all. O 
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2.2 Wald-Church Place Selection 



In the thirties, Abraham Wald (1902 — 1950) proposed to restrict the a priori 
admissible (p to any fixed countable set of functions. Then collectives do exist. 
But which countable set? In 1940, Alonzo Church proposed to choose a set of 
functions representing 'computable' strategies. According to Church's Thesis, 
this is precisely the set of recursive Junctions. With recursive 0, not only is 
the definition completely rigorous, and random infinite sequences do exist, but 
moreover they are abundant since the infinite random sequences with p = 1/2 
form a set of measure one. From the existence of random sequences with proba- 
bility 1/2, the existence of random sequences associated with other probabilities 
can be derived. Let us call sequences satisfying Definition |l| with recursive (j) 
Mises-Wald-Church random. That is, the involved Mises-Wald-Church place- 
selection rules consist of the partial recursive functions. 

Appeal to a theorem by Wald yields as a corollary that the set of Mises- 
Wald-Church random sequences associated with any fixed probability has the 
cardinality of the continuum. Moreover, each Mises-Wald-Church random se- 
quence qualifies as a normal number. (A number is normal in the sense of Emile 
Felix Edouard Justin Borel (1871 — 1956) if each digit of the base, and each block 
of digits of any length, occurs with equal asymptotic frequency.) Note however, 
that not every normal number is Mises-Wald-Church random. This follows, for 
instance, from Champernowne's sequence (or number), 

0.1234567891011121314151617181920 . . . 

due to David G. Champernowne (1912 — ), which is normal in the scale of 10 
and where the ith digit is easily calculated from i. The definition of a Mises- 
Wald-Church random sequence implies that its consecutive digits cannot be 
effectively computed. Thus, an existence proof for Mises-Wald-Church random 
sequences is necessarily nonconstructive. 

Unfortunately, the von Mises-Wald-Church definition is not yet good enough, 
as was shown by Jean Ville in 1939. There exist sequences that satisfy the Mises- 
Wald-Church definition of randomness, with limiting relative frequency of ones 
of 1/2, but nonetheless have the property that 

— > - for all n. 
n ~ 2 

The probability of such a sequence of outcomes in random flips of a fair coin is 
zero. Intuition: if you bet '1' all the time against such a sequence of outcomes, 
then your accumulated gain is always positive! Similarly, other properties of 
randomness in probability theory such as the Law of the Iterated Logarithm 
do not follow from the Mises-Wald-Church definition. An extensive survey on 
these issues (and parts of the sequel) is given in j|] . 
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3 Randomness as Incompressibility 



Above it turned out that describing 'randomness' in terms of 'unpredictability' is 
problematic and possibly unsatisfactory. Therefore, Kolmogorov tried another 
approach. The antithesis of 'randomness' is 'regularity', and a finite string which 
is regular can be described more shortly than giving it literally. Consequently, a 
string which is 'incompressible' is 'random' in this sense. With respect to infinite 
binary sequences it is seductive to call an infinite sequence 'random' if all of its 
initial segments are 'random' in the above sense of being 'incompressible'. Let 
us see how this intuition can be made formal, and whether leads to a satisfactory 
solution. 

Intuitively, the amount of effectively usable information in a finite string is 
the size (number of binary digits or bits) of the shortest program that, without 
additional data, computes the string and terminates. A similar definition can 
be given for infinite strings, but in this case the program produces element after 
element forever. Thus, a long sequence of I's such as 

10,000 times 

/ ^ 

11111... 1 

contains little information because a program of size about log 10, 000 bits out- 
puts it: 

for i := 1 to 10,000 
print 1 

Likewise, the transcendental number tt = 3.1415..., an infinite sequence of 
seemingly 'random' decimal digits, contains but a few bits of information. (There 
is a short program that produces the consecutive digits of tt forever.) Such a 
definition would appear to make the amount of information in a string (or other 
object) depend on the particular programming language used. 

Fortunately, it can be shown that all reasonable choices of programming 
languages lead to quantification of the amount of 'absolute' information in indi- 
vidual objects that is invariant up to an additive constant. We call this quantity 
the 'Kolmogorov complexity' of the object. If an object is regular, then it has 
a shorter description than itself. Wc call such an object 'compressible'. 

More precisely, suppose we want to describe a given object by a finite binary 
string. We do not care whether the object has many descriptions; however, each 
description should describe but one object. From among all descriptions of an 
object we can take the length of the shortest description as a measure of the 
object's complexity. It is natural to call an object 'simple' if it has at least one 
short description, and to call it 'complex' if all of its descriptions are long. 

But now we are in danger of falling in the trap so eloquently described in the 
Richard-Berry paradox, where we define a natural number as "the least natural 
number that cannot be described in less than twenty words". If this number 
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does exist, we have just described it in thirteen words, contradicting its defini- 
tional statement. If such a number does not exist, then all natural numbers can 
be described in less than twenty words. (This paradox is described in [Bertrand 
Russell (1872 — 1970) and Alfred North Whitehead, Principia Mathematica, Ox- 
ford, 1917]. In a footnote they state that it "was suggested to us by Mr. G.G. 
Berry of the Bodleian Library" .) We need to look very carefully at the notion 
of 'description'. 

Assume that each description describes at most one object. That is, there is a 

specification method D which associates at most one object x with a description 
y. This means that D is a function from the set of descriptions, say Y, into the 
set of objects, say X. It seems also reasonable to require that, for each object 
X in X, there is a description y in Y such that D{y) = x. (Each object has a 
description.) To make descriptions useful we like them to be finite. This means 
that there are only countably many descriptions. Since there is a description 
for each object, there are also only countably many describable objects. How 
do we measure the complexity of descriptions? 

Taking our cue from the theory of computation, we express descriptions as 
finite sequences of O's and I's. In communication technology, if the specification 
method D is known to both a sender and a receiver, then a message x can be 
transmitted from sender to receiver by transmitting the sequence of O's and I's 
of a description y with D(y) = x. The cost of this transmission is measured 
by the number of occurrences of O's and I's in y, that is, by the length of y. 
The least cost of transmission of x is given by the length of a shortest y such 
that D{y) ~ x. We choose this least cost of transmission as the 'descriptional' 
complexity of x under specification method D. 

Obviously, this descriptional complexity of x depends crucially on D. The 
general principle involved is that the syntactic framework of the description 
language determines the succinctness of description. 

In order to objectively compare descriptional complexities of objects, to be 
able to say "a; is more complex than z" , the descriptional complexity of x should 
depend on x alone. This complexity can be viewed as related to a universal 
description method which is a priori assumed by all senders and receivers. This 
complexity is optimal if no other description method assigns a lower complexity 
to any object. 

We are not really interested in optimality with respect to all description 
methods. For specifications to be useful at all it is necessary that the mapping 
from y to D{y) can be executed in an effective manner. That is, it can at least in 
principle be performed by humans or machines. This notion has been formalized 
as 'partial recursive functions'. According to generally accepted mathematical 
viewpoints it coincides with the intuitive notion of effective computation. 

The set of partial recursive functions c;ontains an optimal function which 
minimizes description length of every other such function. We denote this func- 
tion by Dq. Namely, for any other recursive function D, for all objects x, there 
is a description y oi x under Dq which is shorter than any description 2; of a; 
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under D. (That is, shorter up to an additive constant which is independent of 
X.) Complexity with respect to Dq minorizes the complexities with respect to 
all partial recursive functions. 

We identify the length of the description of x with respect to a fixed speci- 
fication function Dq with the 'algorithmic (descriptional or Kolmogorov) com- 
plexity' of X. The optimality of Dq in the sense above means that the complexity 
of an object x is invariant (up to an additive constant independent of x) under 
transition from one optimal specification function to another. Its complexity 
is an objective attribute of the described object alone: it is an intrinsic prop- 
erty of that object, and it does not depend on the description formalism. This 
complexity can be viewed as 'absolute information content': the amount of in- 
formation which needs to be transmitted between all senders and receivers when 
they communicate the message in absence of any other a priori knowledge which 
restricts the domain of the message. 

Broadly speaking, this means that all description syntaxes which are power- 
ful enough to express the partial recursive functions are approximately equally 
succinct. The remarkable usefulness and inherent Tightness of the theory of Kol- 
mogorov complexity stems from this independence of the description method. 
Thus, we have outlined the program for a general theory of algorithmic com- 
plexity. The four major innovations are as follows. 

1. In restricting ourselves to formally effective descriptions our definition 

covers every form of description that is intuitively acceptable as being 
effective according to general viewpoints in mathematics and logics. 

2. The restriction to effective descriptions entails that there is a universal 
description method that minorizes the description length or complexity 
with respect to any other effective description method. This would not 
be the case if wc considered, say, all noneffective description methods. 
Significantly, this implies Item 3. 

3. The description length or complexity of an object is an intrinsic attribute 
of the object independent of the particular description method or formal- 
izations thereof. 

4. The disturbing Richard-Berry paradox above docs not disappear, but 
resurfaces in the form of an alternative approach to proving Kurt Godel's 
(1906 — 1978) famous result that not every true mathematical statement 
is provable in mathematics. 

3.1 Kolmogorov Complexity 

To make this treatment precise and self-contained we briefly review notions and 
properties needed in the sequel. We identiiy the natural numbers M and the 
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finite binary sequences as 

(0,e), (1,0), (2,1), (3, 00), (4, 01),..., 

where e is the empty sequence. The length l{x) of a natural number x is the 
number of bits in the corresponding binary sequence. For instance, l{e) = 0. If 
A is a set, then \ A\ denotes the cardinality of A. In some cases we want to encode 
X in self-delimiting form x' , in order to be able to decompose x'y into x and 
y. Short codes are obtained by iterating the simple rule that a self-delimiting 
(s.d.) description of the length of x followed by x itself is a s.d. description of 
X. For example, both x' — l^'^^^Oa; and x" = l''^'(^^^OZ(a;)x are s.d. descriptions 
for X, and l{x') < 2l{x) + 0(1) and l{x") < l{x) + 2l{l{x)) + 0(1). The string 
x" self-delimits itself in a concatenation x"y by the fact that an algorothm to 
retrieve x works as follows. First count the number of 'I's with which x"y starts 
out until we find the first '0'.. This count is the length of the length of x, that 
is, the length of l{x) which is l{l{x)). The binary substring of length l(l{x)) 
following the first '0' is the binary representation of l(x). The next substring 
of length l{x) following it is x itself. So we can retrieve x without having to 
consider even one bit in the string after x. This is why the binary code x" for 
X is called self- delimiting. 

Let (.) : A/" X A/" — > A/" denote a standard computable bijective 'pairing' 
fimction. Throughout this paper, we will assume that 

{x,y) x"y. 

This way, the x" is has the length of x but for an additional logarithmic term. 
Define {x,y,z) by (x, {y,z)). 

We need some notions from the theory of algorithms, see ||l^. Let 0i, 02, • • ■ 
be a standard enumeration of the partial recursive functions. The (Kolmogorov) 
complexity of x G A/", given y, is defined as 

C{x\y) = min{/((n, z)) : 0„((y, z)) ^ x}. 

This means that C{x\y) is the minimal number of bits in a description from 
which X can be effectively reconstructed, given y. The unconditional complexity 
is defined as C{x) = C{x\€). 

An alternative definition is as follows. Let 

C^{x\y) = mm{l{z) : ■il;{{y,z)) = x} (1) 

be the conditional complexity of x given y with reference to decoding function 
tjj. Then C{x\y) = C^{x\y) for a universal partial recursive function -0 that 
satisfies tpdy, n, z)) = (l)n{{y, z)). 

We will also make use of the prefix complexity K{x), which denotes the 
shortest self- delimiting description. To this end, we consider so called prefix 
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Turing machines, which have only O's and I's on their input tape, and thus 
cannot detect the end of the input. Instead we define an input as that part of 
the input tape which the machine has read when it halts. When x ^ y are two 
such input, we clearly have that x cannot be a prefix of y, and hence the set of 
inputs forms what is called a prefix code. We define K{x) similarly as above, 
with reference to a universal prefix machine that first reads 1"0 from the input 
tape and then simulates prefix machine n on the rest of the input. 

A survey is [0. We need the following properties. Throughout 'log' denotes 
the binary logarithm. We often use 0(/(n)) = — 0(/(n)), so that 0{f{n)) may 
denote a negative quantity. For each x,y € Af we have 



For each y £ J\f there is an a; G A/" of length n such that C{x\y) > n. In 
particular, we can set y — e. Such x's may be called random, since they are 
without regularities that can be used to compress the description. Intuitively, 
the shortest effective description of a; is x itself. In general, for each n and y, 
there are at least 2" — 2"""^ + 1 distinct a;'s of length n with 



In some cases we want to encode x in self- delimiting form x', in order to 
be able to decompose x'y into x and y. Good upper bounds on the prefix 
complexity of x are obtained by iterating the simple rule that a self-delimiting 
(s.d.) description of the length of x followed by x itself is a s.d. description of 
X. Since x' = I'^^^Ox and x" — l^'^''^^'>'>Ol{x)x are both s.d. descriptions for x, 
and this shows that K{x) < 2l{x) + 0(1) and K{x) < l{x) + 2l{l{x)) + 0(1). 

Similarly, we can encode x in a self-delimiting form of its shortest program 
p{x) {l{p{x)) = C{x)) in 2C{x) -\- 1 bits. Iterating this scheme, we can encode 
self delimiting program of C{x) -\- 21ogC(x) -I- 1 bits, which shows that 
K{x) < C{x) -I- 21ogC(x) + 1, and so on. 

If w = wi . . . Wfc . . . . . . is a finite or infinite string then ujkd denotes Wk ■ ■ - OJi, 

a string of length I — k + \ . 

3.2 Complexity Oscillations 

Consider the question of how C behaves in terms of increasingly long initial 
segments of a fixed infinite binary sequence (or string) uj. For instance, is it 
monotone in the sense that C{uji;m) < C{uji;n), or C{uji;m\m) < C{uji:n\n), for 
all infinite binary sequences uj and all m < n? We can readily show that the 
answer is negative in both cases. A similar effect arises when we try to use 
Kolmogorov complexity to solve the problem of finding a proper definition of 
random infinite sequences (collectives) according to the task already set by von 
Mises in 1919, see Section 0. 



C{x\y)<l{x)+0{1). 



(2) 



C{x\y) > n — c. 



(3) 
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Kolmogorov's intention was to call an infinite binary sequence a; 'random' if 
there is a constant c such that, for all n, the n-length prefix uii-.n has C(tJi:„) > 
n — c. However, such sequences do not exist. A simple argument shows that 
that even for high-complexity sequences, with C{uJi;n) > n — logn — 2 log log n 
for all n, this results in so-called complexity oscillations, where 

n - C{uJi;n) 

logn 

oscillates between and about 1. 

We show that the C complexity of prefixes of each infinite binary sequence 
drops infinitely often unboundedly far below its own length. Let w be any infinite 
binary sequence, and oJi-m any m-length prefix of u. If uJi-m is the nth binary 
string in the lexicographical order 0, 1, 00, . . ., that is, n = oJi-m, m — l{n), then 
C(ijJi;n) = C(iOra+\:n) + c, with c a coustaut independent of n or m. Namely, 
with 0(1) additional bits of information, we can trivially reconstruct the nth 
binary string LO\;ra from the length n — l(n) of Lo^a+x-.n- Then we find that 
C{LOra+\:n) 1^ n — l{n) + c for somc constant c independent of n, whence the 
claimed result follows. 

Our approach in this proof makes it easy to say something about the fre- 
quency of these complexity oscillations. Define a 'wave' function w by w{l) = 2 
and w{i) = 2"'(*^i), then the above argument guarantees that there are at least 
k values ni, n2, . . . , n^ less than n = w(k) such that C(wi:„. ) < n^ — g{ni) -\- c 
for all i = 1, 2, . . . , fc. Obviously, this can be improved. 

The upper bound on the oscillations, C(cl;i:„) = n+0(l), is reached infinitely 
often for almost all high-complexity sequences. Furthermore, the oscillations of 
all high-complexity sequences stay above n — log n — 2 log log n, but dip infinitely 
often below n — log n. 

Due to the complexity oscillations the idea of identifying random infinite 
sequences with those such that C(u;i:„) > n — c, for all n, is trivia lly i nfeasible. 



That is the bad news. In contrast, a similar approach in Section 4.3 for finite 
binary sequences turned out to work just fine. Its justification was found in Per 
Martin-L6f 's (1942 — ) important insight that to justify any proposed definition 
of randomness one has to show that the sequences, which are random in the 
stated sense, satisfy the several properties of stochasticity we know from the 
theory of probability. Instead of proving each such property separately, one 
may be able to show, once and for all, that the random sequences introduced 
possess, in an appropriate sense, all possible properties of stochasticity. 

In Section ^ we show how to define randomness of infinite sequences in 
Martin-Lof's sense, which is a formal expression of the attribute of satisfying 
all effectively testable laws of randomness — and hence satisfactorily resolves the 
quest for a proper definition of random sequences. Without proof we state the 
relation between Martin-L6f randomness and high Kolmogorov complexity. Let 
u> be an infinite binary sequence. 
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(i) If there exists a constant c such that C(wi:n) > n — c, for infinitely many 
n, then uj is random in the sense of Martin-L6f with respect to the uniform 
measure. 

(ii) The set of lu, for which there exists a constant c and infinitely many n 
such that C{uJi;n) > n — c, has uniform measure one. 

Hence, the set of random sequences not satisfying the condition (i) has uni- 
form measure zero. 

The idea that random infinite sequences are those sequences such that the 
complexity of the initial n-length segments is at most a fixed additive constant 
below n, for all n, is one of the first-rate ideas in the area of Kolmogorov 
complexity. In fact, this was one of the motivations for Kolmogorov to invent 



Kolmogorov complexity in the first place. We have seen in Section 4.4 that this 
does not work for the plain Kolmogorov complexity C(-), due to the complexity 
oscillations. The next result is important, and is a culmination of the theory. 
For prefix complexity K{-) it is indeed the case that random sequences are those 
sequences for which the complexity of each initial segment is at least its length. 

The history of invention of Kolmogorov complexity is presented in 
detail in Here it is important that A.N. Kolmogorov suggested 
the above relation between the complexity of initial segments and 
randomness of infinite sequences already in 1964 [see for example 
A.N. Kolmogorov, IEEE Trans. Inform. Theory, IT-14:5(1968), 
662-664]. Similar suggestions were explicitly or implicitly made in 
some form by R. Solomonoff in 1960, [|l6j, and Gregory J. Chaitin 
(1947 — ) in 1969, |l|. This approach being incorrect using C(-) com- 
plexity, P. Martin-L6f who visited Kolmogorov in Moscow in 1964 
and 1965 already in 1965 [published in Inform. Contr., 9(1966), 602- 



619] developed the theory as put forth in Section 4.4. Nevertheless, 
Kolmogorov did not abandon the general outlines of his original idea 
of connecting randomness of infinite sequences with complexity, see 
pp. 405-406 of [Vladimir Andrccvich Uspensky (1930 — ), J. Symb. 
Logic, 57:2(1992), 385-412]. 

Simultaneously and independently, Glaus P. Schnorr [J. Corn- 
put. Syst. Sci., 7(1973), 376-388] for the uniform distribution, and 
Leonid A. Levin (1948— ) [Sov. Math. Dokl, 14(1973), 1413-1416] 
for arbitrary computable distributions, introduced unequal versions 
of complexity (respectively, process complexity and the monotone 
variant of complexity Km{x), in order to characterize randomness. 
They gave the first executions of Kolmogorov's idea by showing that 
an infinite sequence oj is random iff \Km{uJi;n) — n\ = 0(1) (Levin) 
and a similar statement for process complexity (Schnorr). 

In 1974 L.A. Levin in [D, see also Peter Gacs (1947— ) [§, and in- 
dependently in 1975 G.J. Ghaitin in [^, introduced the prefix version 
of Kolmogorov complexity which we have denoted by K{-). Ghaitin 
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in his paper again proposed calling an infinite sequence lo random iff 
K{ijji;n) > n — 0(1) for all n. Happily, this proposal characterizes 
once more precisely those infinite binary sequences which are ran- 
dom in Martin-L6f 's sense (without proof attributed to CP. Schnorr, 
1974, in Chaitin's paper). This important result was widely circu- 
lated, but the first published proo/ appears, perhaps, as Corollary 3.2 
in [Vladimir Vyacheslavovich V'yugin (1948 — ), Semiotika i Infor- 
matika, 16(1981), 14-43, in Russian]. See for historical notes [A.N. 
Kolmogorov and V.A. Uspensky, SI AM J. Theory Probab. AppL, 
32(1987), 387-412]. 

Another equivalent proposal in terms of constructive measure 
theory was given by Robert M. Solovay (1938 — ). The fact that 
such different effective formalizations of infinite random sequences 
turn out to define the same mathematical object constitutes evidence 
that our intuitive notion of infinite sequences which are effectively 
random coincides with the precise notion of Martin-L6f random in- 
finite sequences. 

3.3 Relation with Unpredictability 

We recall von Mises' classic approach to obtain infinite random sequences w as 
treated in Section |^, which formed a primary inspiration to the work reported 
in this section. It is of great interest whether one can, in his type of formulation, 
capture the intuitively and mathematically satisfying notion of infinite random 
sequence in the sense of Martin-L6f. According to von Mises an infinite binary 
sequence co was random (a collective) if: 

1. uj has the property of frequency stability with limit p; that is, if /„ = 
LOi + UJ2 + ■ ■ ■ + iUn, then the limit of fn/n exists and equals p. 

2. Any subsequence of to chosen according to an admissible place-selection 
rule has frequency stability with the same limit p as in condition 1. 

One major problem was how to define 'admissible', and one choice was to 
identify it with Church's notion of selecting a subsequence C1C2 ■ • ■ of UJ1LU2 ■ ■ ■ 
by a partial recursive function by ^„ = w„i if (j^i^i-.r) — precisely m — 1 
times for r < TO — 1 and (f){uji.m-i) = 0. We called these cj) 'place-selection rules 
according to Mises-Church,' and the resulting sequences C, Mises- Wald-Church 
random. 

In Section || we stated that there are Mises- Wald-Church random sequences 
with limiting frequency 1/2 which do not satisfy effectively testable proper- 
ties of randomness like the Law of the Iterated Logarithm or the Recurrence 
Property. (Such properties are by definition satisfied by sequences which are 
Martin-L6f random.) In fact, the distinction between the two is quite large, 
since there are Mises- Wald-Church collectives u; with limiting frequency 1/2 
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such that C{u>i:n) = 0{f{n)\ogn) for every unbounded, nondecreasing, total 
recursive function /. Such collectives are very nonrandom sequences from the 
viewpoint of Martin-L6f randomness where C(ciJi:„) is required to be asymp- 
totic to n. See Robert P. Daley (1944— ), Math. Syst. Theory, 9(1975), 83-94. 
It is interesting to point out that although a Mises-Wald- Church random se- 
quence may have very low Kolmogorov complexity, it in fact has very high 
time-bounded Kolmogorov complexity. If we consider also sequences with lim- 
iting frequencies different from 1/2, then it becomes even easier to find se- 
quences which are random according to Mises-Wald-Church, but not according 
to Martin-L6f. Namely, any sequence oj with limiting relative frequency p has 
complexity C(a;i:„) < Hn + o(n) for H = —{plogp -|- (1 — p)log(l — p)) [H 
is Shannon's entropy). This means that for each e > there are Mises-Wald- 
Church random sequences uj with C(a;i:„) < en for all but finitely many n. 

On the other hand, clearly all Martin-L6f random sequences are also Mises- 
Wald-Church random (each admissible selection rule is an effective sequential 
test). 

3.4 Kolmogorov-Loveland Place Selection 

This suggests that we have to liberate our notion of admissible selection rule 
somewhat in order to capture the proper notion of an infinite random sequence 
using von Mises' approach. A proposal in this direction was given by A.N. Kol- 
mogorov [Sankhya, Ser. A, 25(1963), 369-376] and Donald William Loveland 
(1934— ) [Z. Math. Logik Grundl. Math. 12(1966), 279-294]. 

A Kolmogorov-Loveland admissible selection function to select an infinite 
subsequence C1C2 • • • from oj = ujiu)2 • • ■ is any one-one recursive function (j) : 
{0, 1}* M X {0, 1} from binary strings to (index, bit) pairs. The index gives 
the next position in oj to be scanned, and the bit indicates if the scanned bit of 
OJ must be included in ^1^2 More precisely. 



The sequence CiC2--- is called a Kolmogorov-Loveland random sequence. Be- 
cause (f) is one-one, it must hold that im ii,i2, ■ ■ ■ , im-i- The described pro- 
cess yields a sequence of Rvalues (zi, ai), (2:2, 02), The selected subsequence 

C1C2 • • • consists of the ordered sequence of those Zi^s of which the associated 
tti's equal ones. 

As compared to the Mises-Wald-Church approach, the liberation is con- 
tained in the fact that the order of succession of the terms in the subsequence 
chosen is not necessarily the same as that of the original sequence. In com- 
parison, it is not obvious (and it seems to be unknown) whether a subsequence 
C1C2 • • • selected from a Kolmogorov-Loveland random sequence UJ1UJ2 ■ ■ ■ by a 
Kolmogorov-Loveland place-selection rule is itself a Kolmogorov-Loveland ran- 




with Zj = < j < m. 
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dom sequence. Note that the analogous property necessarily holds for Mises- 
Wald-Church random sequences. 

The set of Kolmogorov-Loveland random sequences is contained in the set 
of Mises-Wald-Church random sequences and contains the set of Martin-L6f 
random sequences. If llJiLlJ2 ... is Kolmogorov-Loveland random then clearly 
^1^2 •■ - J defined by Q = ^a{i) with a being a recursive permutation, is also 
Kolmogorov-Loveland random. The Mises-Wald-Church notion of randomness 
does not have this important property of randomness of staying invariant under 
recursive permutation. Loveland gave the required counterexample in the cited 
reference. Hence, the containment of the set of Kolmogorov-Loveland random 
sequences in the set of Mises-Wald-Church random sequences is proper. 

This leaves the question of whether the containment of the set of Martin- 
Lof random sequences in the set of Kolmogorov-Loveland random sequences is 
proper. Kolmogorov has suggested in [Problems Inform. Transmission, 5(1969), 
3-4] without proof that there is a Kolmogorov-Loveland random sequences u) 
such that C(wi:„) = O(logn). But Andrei Albertovich Muchnik (1958 — ) (not 
to be confused with A. A. Muchnik) showed that this is false since no u; with 
C{u!i:n) < cn-l-O(l) for a constant c < 1 can be Kolmogorov-Loveland random. 
Nonetheless, containment is proper since Alexander Khanevich Shen' (1958 — ) 
[Soviet Math. DokL, 38:2(1989), 316-319] has shown there exists a Kolmogorov- 
Loveland random sequence which is not random in Martin-Lof's sense. There- 
fore, the problem of giving a satisfactory definition of infinite Martin-L6f random 
sequences in the form proposed by von Mises has not yet been solved. See also 
[A.N. Kolmogorov and V.A. Uspensky, Theory Prohah. AppL, 32(1987), 389- 
412; V.A. Uspensky, Alexei Lvovich Semenov (1950 — ) and A.Kh. Shen', Russ. 
Math. Surveys, 45:1(1990), 121-189]. 

4 Randomness as Membership of All Large Ma- 
jorities 

For a better understanding of the problem revealed by Ville, as in Section ||, 
and its subsequent solution by P. Martin-L6f in 1966, we look at some aspects 
of the methodology of probability theory. 

4.1 Typicality 

Consider the sample space of all one-way infinite binary sequences generated 
by fair coin tosses. Intuitively, we call a sequence 'random' iff it is 'typical'. It 
is not 'typical', say 'special', if it has a particular distinguishing property. An 
example of such a property is that an infinite sequence contains only finitely 
many ones. There are infinitely many such sequences. But the probability that 
such a sequence occurs as the outcome of fair coin tosses is zero. 'Typical' infinite 
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sequences will have the converse property, namely, they contain infinitely many 
ones. 

In fact, one would like to say that 'typical' infinite sequences will have all 
converse properties of the properties which can be enjoyed by 'special' infinite 
sequences. That is, such a sequence should belong to all large majorities. This 
can be formalized as follows. 

Suppose that a single particular property, such as containing infinitely many 
occurrences of ones (or zeros), the Law of Large Numbers, or the Law of the 
Iterated Logarithm, has been shown to have probability one, then one calls 
this a Law of Randomness. Each sequence satisfying this property belongs to 
a large majority, namely the set of all sequences satisfying the property which 
has measure one by our assumption. 

Now we call an infinite sequence is 'typical' or 'random' if it belongs to 
all majorities of measure one, that is, it satisfies all Laws of Randomness. In 
other words, each single individual 'random' infinite sequence posesses all prop- 
erties which hold with probability one for the ensemble of all infinite sequences. 
This is the substance of so-called pseudo-randomness tests. For example, to 
test whether the sequence of digits corresponding to the decimal expansion of 
TT = 3.1415 ... is random one tests whether the initial segment satisfies some 
properties which hold with probability one for the ensemble of all sequences. 

Example 2 One such property is 'normality' of a sequence as defined earlier. 
Around 1909, Emile Borel called an infinite sequence of decimal digits normal in 
the scale often if, for each fc, the frequency of occurrences (possibly overlapping) 
of each block y of length fc > 1 in the initial segment of length n goes to limit 
lO"*-' for n grows unbounded, It is known that normality is not sufficient 
for randomness, since Champernowne's sequence 



is normal in the scale of ten. On the other hand, it is universally agreed that 
a random infinite sequence must be normal. (If not, then some blocks occur 
more frequent than others, which can be used to obtain better than fair odds 
for prediction.) 

For a particular binary sequence oj = u)iu)2 ... let /„ = wi -I- + • • • + w„. 
Of course, we cannot effectively test an infinite sequence. Therefore, a so- 
called pseudo-randomness test examines increasingly long initial segments of 
the individual sequence under consideration. 

We can define a pseudo randomness test for the normality property with k = 
1 to test a candidate infinite sequence for increasing n whether the deviations 
from one half O's and I's become too large. For example, by checking for each 
successive n whether 
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for a fixed constant e > 0. (The Law of the Iterated Logarithm states that this 
inequality should not hold for infinitely many n). If within n trials in this process 
we find that the inequality holds k times, then we assume the original infinite 
sequence to be random with confidence at most, say, Y^^^i 1/2* — Y^^^i 1/2*. 
(The sequence is random if the confidence is greater than zero for n goes to 
infinity, and not random otherwise.) 

Clearly, the number of pseudo-randomness tests we can devise is infinite. 
Namely, just for the normality property alone there is a similar pseudo-randomness 
test for each k>l. O 

But now we are in trouble. The naive execution of the above ideas in classi- 
cal mathematics is infeasible. Each individual infinite sequence induces its very 
own pseudo-randomness test which tests whether a candidate infinite sequence 
is in fact that individual sequence. Each infinite sequence forms a singleton 
set in the sample space of all infinite sequences. All complements of singleton 
sets in the sample space have probability one. The intersection of all comple- 
ments of singleton sets is clearly empty. Therefore, the intersection of all sets 
of probability one is empty. Thus, there are no random infinite sequences! 

Let us give a concrete example. Consider as sample space S the set of all 
one-way infinite binary sequences. The cylinder = {to : to = x . . .} consists 
of all infinite binary sequences starting with the finite binary sequence x. For 
instance, = S, where e denotes the empty sequence, that is, the sequence 
with zero elements. The uniform distribution A on the sample space is defined 
by X{rx) = 2~'(^). That is, the probability of an infinite binary sequence to 
starting with a finite initial segment x is 2"'^^^. In probability theory it is 
general practice that if a certain property, such as the Law of Large Numbers, 
or the Law of the Iterated Logarithm, has been shown to have probability one, 
then one calls this a Law of Randomness. For example, in our sample space the 
Law of Large Numbers says that lim„^oc.(wi -I- • • • -|- a;„)/n = 1/2. If A is the 
set of elements of S which satisfy the Law of Large Numbers, then it can be 
shown that A(^) = 1. 

Generalizing this idea for 5* with measure /U, one may identify any set B C S, 
such that /u(i?) = 1, with the Law of Randomness, namely, 'to be an element of 
B\ Elements of S which are do not satisfy the law 'to be an element of B' form 
a set of measure zero, a null set. It is natural to call an element of the sample 
space 'random' if it satisfies all laws of randomness. Now we are in trouble. 
For each element to G S, the set B^ = S — {uj} forms a law of randomness. 
But the intersection of all these sets B^^ of probability one is empty. Thus, no 
sequence would be random, if we require that all laws of randomness that 'exist' 
are satisfied by a random sequence. 
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4.2 Randomness in Martin-Lof's Sense 

The Swedish mathematician Per Martin-Lof, visiting Kolmogorov in Moscow 
during 1964-1965, investigated the complexity oscillations of infinite sequences 
and proposed a definition of infinite random sequences which is based on con- 
structive measure theory using ideas related to Kolmogorov complexity, [Inform. 
Conir., 9(1966), 602-619; ^. Wahrsch. Verw. Geb., 19(1971), 225-230]. This way 
he succeeded in defining random infinite sequences in a manner which is free of 
above difficulties. 

It turns out that a constructive viewpoint enables us to carry out this pro- 
gram mathematically without such pitfalls. In practice, all laws that arc proved 
in probability theory to hold with probability one are effective in the formal 
sense of effective computability due to A.M. Turing. A straightforward formal- 
ization of this viewpoint is to require a law of probability to be partial recursive 
in the sense that we can effectively test whether it is violated. This suggests that 
the set of random infinite sequences should not be defined as the intersection of 
all sets of measure one, but as the intersection of all sets of measure one with 
a recursively enumerable complement. The latter intersection is again a set of 
measure one with a recursively enumerable complement. Hence, there is a single 
effective law of randomness which can be stated as the property 'to satisfy all 
effective laws of randomness', and the infinite sequences have this property with 
probability one. 

(There is a related development in set theory and recursion theory, namely, 
the notion of 'generic object' in the context of 'forcing'. For example, an equiva- 
lent definition of genericity in arithmetic is being a member of the intersection of 
all arithmetical sets of measure 1. There is a notion, called '1-genericity', which 
calls for the intersection of all recursively enumerable sets of measure 1. This 
is obviously related to the approach of Martin-Lof, and prior to it. Forcing was 
introduced by Paul Cohen (1934 — ) in 1963 to show the independence of the 
continuum hypothesis, and using sets of positive measure as forcing conditions 
is due to Robert M. Solovay soon afterwards.) 

The natural formalization is to identify the effective test with a partial re- 
cursive function. This suggests that one ought to consider not the intersection 
of all sets of measure one, but only the intersection of all sets of measure one 
with recursively enumerable complements. (Such a complement set is expressed 
as the union of a recursively enumerable set of cylinders). It turns out that this 
intersection has again measure one. Hence, almost all infinite sequences satisfy 
all effective Laws of Randomness with probability one. This notion of infinite 
random sequences turns out to be related to infinite sequences of which all finite 
initial segments have high Kolmogorov complexity. 

The notion of randomness satisfied by both the Mises-Wald- Church 
collectives and the Martin-Lof random infinite sequences is roughly 
that effective tests cannot detect regularity. This does not mean that 
a sequence may not exhibit regularities which cannot be effectively 
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tested. Collectives generated by Nature, as postulated by von Mises, 
may very well always satisfy stricter criteria of randomness. Why 
should collectives generated by quantum mechanic phenomena care 
about mathematical notions of computability? Again, satisfaction 
of all effectively testable prerequisites for randomness is some form 
of regularity. Maybe nature is more lawless than adhering strictly 
to regularities imposed by the statistics of randomness. 

Until now the discussion has centered on infinite random sequences where the 
randomness is defined in terms of limits of relative frequencies. However, 

"The frequency concept based on the notion of limiting frequency 
as the number of trials increases to infinity, does not contribute 
anything to substantiate the application of the results of probability 
theory to real practical problems where we always have to deal with 
a finite number of trials." [A.N. Kolmogorov, On tables of random 
numbers, Sankhya, Series A, 25(1963), 369-376. Page 369.] 

The practical objection against both the relevance of considering infinite se- 
quences of trials and the existence of a relative frequency limit is concisely 
put in John Maynard Keynes' (1883 — 1946) famous phrase "in the long run 
we shall all be dead." It seems more appealing to try to define randomness 
for finite strings first, and only then define random infinite strings in terms of 
randomness of initial segments. 

The approach of von Mises to define randomness of infinite sequences in 
terms of unpredictability of continuations of finite initial sequences under certain 
laws (like recursive functions) did not lead to satisfying results. The Martin- 
Lof approach does lead to satisfying results, and is to a great extent equivalent 
with the Kolmogorov complexity approach. Although certainly inspired by the 
random sequence debate, the introduction of Kolmogorov complexity marks a 
definite shift of point of departure. Namely, to define randomness of sequences 
by the fact that no program from which an initial segment of the sequence can 
be computed is significantly shorter than the initial segment itself, rather than 
that no program can predict the next elements of the sequence. Thus, we change 
the focus from the 'unpredictability' criterion to the 'incomprcssibility' criterion, 
and since this will turn out to be equivalent with Martin-Lof's approach, the 
'incomprcssibility' criterion is both necessary and suflacient. 

4.3 Random Finite Sequences 

Finite sequences which cannot be effectively described in a significant shorter 
description than their literal representation are called random. Our aim is to 
characterize random infinite sequences as sequences of which all initial finite 
segments are random in this sense. Martin-Lof's related approach characterizes 
random infinite sequences as sequences of which all initial finite segments pass 
all efii'ective randomness tests. 
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Initially, before the idea of complexity, Kolmogorov proposed a close 
analogy to von Mises' notions in the finite domain. Consider a gen- 
eralization of place-selection rules insofar as the selection of can 
depend on Oj with j > i [A.N. Kolmogorov, Sankhya, Series A, 
25(1963), 369-376]. Let $ be a finite set of such generalized place- 
selection rules. Kolmogorov suggested that an arbitrary finite binary 
sequence a of length n > m can be called (to, e)-random with respect 
to if there exists some p such that the relative frequency of the 
I's in the subsequences a^^ . . . a^^ with r > m, selected by applying 
some </) in $ to a, all lie within e of p. (We discard (p that yield sub- 
sequences shorter than to.) Stated differently, the relative frequency 
in this finite subsequence is approximately (to within e) invariant 
under any of the methods of subsequence selection that yield sub- 
sequences of length at least to. Kolmogorov has shown that if the 
cardinality of $ satisfies: 

then, for any p and any n>m there is some sequence a of length n 
which is (to, e)-random with respect to 

Let us borrow some ideas from statistics. We are given a certain sample space 
S with an associated distribution P. Given an element x of the sample space, 
we want to test the hypothesis 'x is a typical outcome'. Practically speaking, 
the property of being typical is the property of belonging to any reasonable 
majority. In choosing an object at random, we have confidence that this object 
will fall precisely in the intersection of all such majorities. The latter condition 
we identify with x being random. 

To ascertain whether a given element of the sample space belongs to a par- 
ticular reasonable majority we introduce the notion of a test. Generally, a test 
is given by a prescription which, for every level of significance e, tells us for 
what elements x of S the hypothesis 'x belongs to majority M in S" should be 
rejected, where e = 1 — P{M). Taking e = 2~"*, to = 1, 2, . . ., this amounts to 
saying that we have a description of the set V C x 5 of nested critical regions 

Vm = {x: (to, x) e V} 

Vm 2 Vm+1, TO = 1,2, .... 

The condition that Vm he a. critical region on the significance level e = 2~™ 
amounts to requiring, for all n 

^{P{x):l{x)=n,xeVm}<e. 

X 

The complement of a critical region Vm is called the (1 — e) confidence interval. 
If a; e Vm, then the hypothesis 'a; belongs to majority M\ and therefore the 
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stronger hypothesis 'a; is random', is rejected with significance level e. We can 
say that x fails the test at the level of critical region Vm- 

Example 3 A string :r,iX2 . . .a;„ with many initial zeros is not very random. 
We can test this aspect as follows. The special test V has critical regions 

Vi,V2, Consider x = 0.0:1X2 . . . a;„ as a rational number, and each critical 

region as a half-open interval Vm = [0, 2"™) in [0, 1), m = 1,2, — Then the 
subsequent critical regions test the hypothesis 'x is random' by considering the 
subsequent digits in the binary expansion of x. We reject the hypothesis on 
the significance level e = 2~™ provided xi = X2 = • • • = Xm = 0, Another 
test for randomness of finite binary strings rejects when the relative frequency 
of ones differs too much from 1/2. This particular test can be implemented 
by rejecting the hypothesis of randomness of x = X1X2 . . . x„ at level e = 2~™ 
provided |2/„ — n| > g{n,m), where /„ = ^^^iXi, and g{n,m) is the least 
number determined by the requirement that the number of binary strings x of 
length n for which this inequality holds is at most 2""'". O 

In practice, statistical tests are effective prescriptions such that we can com- 
pute, at each level of significance, for what strings the associated hypothesis 

should be rejected. It would be hard to imagine what use it would be in statis- 
tics to have tests that are not effective in the sense of computability theory. 

Definition 2 Let P be a recursive probability distribution on the sample space 
J\f. A total function 6 : N" Af is a, P-test (Martin-L6f test for randomness) if: 

1. 5 is enumerable (the set V = {(m,x) : S{x) > m} is recursively enumer- 
able); and 

2. J2{Pix) ■■ Hx) > m, l(x) = n}< 2"™, for all n. 

The critical regions associated with the common statistical tests arc present 
in the form of the sequence Vi 3 V2 3 • • •, where Vm — {x : S{x) > m}, for 
m > 1. Nesting is assured since 6{x) > m + 1 implies 6{x) > m. Each set Vm is 
recursively enumerable because of Item 1. 

A particularly important case is P is the uniform distribution, defined by 
L(x) = 2^^'(^'. The restriction of L to strings of length n is defined by Ln{x) = 
2~" for l{x) = n and otherwise. (By definition, Ln{x) = L(x\l{x) = n).) 
Then, Item 2 can be rewritten as X^^gy^ Ln{x) < 2"™ which is the same as 

d{{x : l{x) = n, X e V^}) < 2""™. 

In this case we often speak simply of a test, with the uniform distribution L 
understood. 

In statistical tests membership of (m, x) in V can usually be deter- 
mined in time polynomial in l{m) + l{x). 
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Example 4 The previous test examples can be rephrased in terms of Martin- 
Lof tests. Let us try a more subtle example. A real number such that all bits in 
odd positions in its binary representation are I's is not random with respect to 
the uniform distribution. To show this we need a test which detects sequences 
of the form x — la;2la;4la;6l2;8 • ■ ■• Define a test S by 

S{x) = max{i : xi = X3 ^ ■ ■ ■ = X21-1 = 1}, 

and S{x) = if xi = 0. For example: ^(01111) = 0; 5(10011) = 1; 5(11011) = 1; 
(5(10100) — 2; (5(11111) = 3. To show that (5 is a test we have to show that 
6 satisfies the definition of a test. Clearly, S is enumerable (even recursive). 
If 6{x) > m where l{x) = n > 2m, then there are 2™~^ possibilities for the 
(2m — l)-length prefix of x, and 2"~(2m-i) possibilities for the remainder of x. 
Therefore, d{x : 5{x) > m, l{x) = n} < 2"-". O 



Definition 3 A universal Martin-L6f test for randomness with respect to dis- 
tribution P, a universal P-test for short, is a test 5q{-\P) such that for each 
P-test 5, there is a constant c, such that for all x, we have 5q{x\P) > 6{x) — c. 

We started out with the objective to establish in what sense incompressible 
strings may be called random. Kolmogorov considered a notion of randomness 
deficiency d{x\A) = l{d{A)) — C{x\A) of a string x relative to a finite set A. With 
A the set of strings of length n and x E Awe find (5(x|yl) = 5{x\n) = n — C{x\n). 

Theorem 1 The function f{x) — n — C{x\n) — \ is a universal L-test with L 
the uniform distribution over {0, 1}* and n = l(x). 



4.4 Random Infinite Sequences 

Consider the question of how C behaves in terms of increasingly long initial 
segments of a fixed infinite binary sequence (or string) lu. For instance, is it 
monotone in the sense that C'{LUi;m) < C'{LUi;n), or C (uji;m\'m) < C {LUi;n\n) , for 
all infinite binary sequences uj and all m < n7 We have already seen that the 
answer is negative in both cases. A similar effect arises when we try to use 
Kolmogorov complexity to solve the problem of finding a proper definition of 
random infinite sequences (collectives) according to the task already set by von 
Mises in 1919, Section |. 

It is seductive to call an infinite binary sequence u random if there is a 
constant c such that, for all n, the rt-length prefix LUi;n has C(a;i:„) > n — c. 
However, such sequences do not exist. 



As in Section LS, we define a test for randomness. However, this time 
the test will not be defined on the entire sequence (which is impossible for an 
effective test and an infinite sequence), but for each finite binary string. The 
value of the test for an infinite sequence is then defined as the maximum of 
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the values of the test on ah prefixes. Since this suggests an effective process of 
sequential approximations, we call it a sequential test. Below, we need to use 
notions of continuous sample spaces and measures. 

Definition 4 Let /i be a recursive probability measure on the sample space 
{0, 1}°°. A total function S : {0, 1}°° JVU {oo} is a sequential ^-test (sequen- 
tial Martin-L6f /x-test for randomness) if: 

1. d{ijj) = sup„gjv/{7(wi:fi)}, where 7 : jV ^ A/" is a total enumerable function 
{V = {(m, y) : 7(y) > m} is a recursively enumerable set); and 

2. : S{u)) >m}< 2"™, for each m > 0. 

If /X is the uniform measure A, then we often use simply sequential test. 

We can require 7 to be a recursive function without changing the 
notion of a sequential /z-test. By definition, for each enumerable 

function 7 there exists a recursive function (/> such that (f>{x, k) non- 
decreasing in k such that lim.k^^ (p{x,k) = j{x). Define a recur- 
sive function 7' by j'{u)i;n) = 4'{'^i:m, k) with (m, k) = n. Then, 

SUP„eA/-{7'('^l:n)} = SUp„gjv-{7('^l:n)}- 

Example 5 Consider {0,1}°° with the uniform measure \{x) = 2~'(^^. An 
example of a sequential A-test is to test whether there are I's in even positions 
of a; e {0, 1}°°. Let 



The number of a;'s of length n such that 7(x) > m is at most 2"/^ for any 

m > 1. Therefore, A{a; : 6{iu) > m} = < 2"™ for m > 0. For m, = 0, 
X{u) : 5{u)) > m} < 2"™ holds trivially. A sequence ui is random with respect to 
this test if S{ui) < 00. Thus, a sequence ( with O's in all even locations will have 
6{C) = 00, and it will fail the test and hence ^ is not random with respect to 
this test. Notice that this is not a very strong test of randomness. For example, 
a sequence 77 = 010°° will pass S and be considered random with respect to this 
test. This test only filters out some nonrandom sequences with all O's at the 
even locations and cannot detect other kinds of regularities. O 

We continue the general theory of sequential testing. If 5{lu) = 00, then we 
say that lo fails 6, or that 6 rejects uj. Otherwise, lo passes 6. By definition, the 
set of w's that is rejected by S has /x-measure zero, and, conversely, the set of 
w's that pass S has /U- measure one. 

Suppose for a test S holds i5(a;) = m. Then there is a prefix y of u, with 
l{y) minimal, such that 'y{y) = m for the 7 used to define S. Then, obviously. 
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each infinite sequence C that starts with y has S{() > m. The set of such ^ is 
= {C : C = yPiP G {0,1}°°}, the cyhnder generated by y. Geometrically 

speaking, Ty can be viewed as the set of all real numbers O.y . . . corresponding 

to the half-open interval ly = [O.y, O.y + 2~'^^^). (For the uniform measure, 

A(rj^) = 2-'-^y\ the common length of ly.) 

In terms of common statistical tests, the critical regions are formed by the 

nested sequence Vi 3 1^ D • • where Vm is defined as Vm — {to : S{uj) > m}, 

for m > 1. We can formulate the definition of Vm as 

Vm^[j{ry:{m,y)eV}. 

In geometric terms, Vm is the union of a set of subintervals of [0, 1). Since V is 
recursively enumerable, so is the set of intervals whose union is Vm- For each 
critical section we have /i(V^) < 2^™ (in the measure we count overlapping 
intervals only once). 

Now we can reformulate the notion of passing a sequential test S with asso- 
ciated set V: 

oo 

6{uj) < oo iS uj ^ Pi Vm- 

m— 1 



Definition 5 Let V be the set of all sequential /i-tests. An infinite binary 
sequence w, or the binary represented real number O.w, is called fi-random if it 
passes all sequential /u-tests: 

oo 

^ ^ u n 

yev m=i 

For each sequential fi-test V, we have ^J,{^\m=l = 0, by Definition ^. 
We call C\m=i ^™ ^ constructive fi-null set . Since there are only countably 
infinitely many sequential /i-tests V, it follows from standard measure theory 
that 

J\j nK.)=o, 

\yev m=i J 

and we call the set U — IJygv V\^i=\ the maximal constructive fi-null set . 

Similar to Section we construct an enumerable function (5o(w|/x), the 
universal sequential //-test which incorporates (majorizes) all sequential /x-tests 
61,62, - - - , and that corresponds to U- 

Definition 6 A universal sequential ^-test / is a sequential /i-test such that 
for each sequential /i-test 6i there is a constant c > and for all u G {0, 1}°°, 
we have f{u>) > 6i{(jj) — c. 
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Theorem 2 There is a universal sequential fj,-test (denoted as So(-\fi)). 



Definition 7 Let the sample space {0, 1}°° be distributed according to ^, and 
let Soi'llJ-) be a universal sequential fi-test. An infinite binary sequence uj is 
jjL-random in the sense of Martin- Lof, if (5o(w|/x) < oo. We call such a sequence 
simply random, where both fi and Martin-L6f are understood. (This is partic- 
ularly interesting for /i is the uniform measure.) 

Note that this definition does not depend on the choice of the particular 
universal sequential fi-test with respect to which the level is defined. Hence, 
the line between random and nonrandom infinite sequences is drawn sharply 
without dependence on a reference /i-test. It is easy to see that the set of infinite 
sequences, which are not random in the sense of Martin-L6f, forms precisely the 
maximal constructive /z-nuU set of /x-measure zero we have constructed above. 
Therefore, 

Theorem 3 Let ^ he a recursive measure. The set of ^-random infinite binary 
sequences has ^-measure one. 

We say that the universal sequential /i-test (5o(-|^) rejects an infinite se- 
quence with probability zero, and we conclude that a randomly selected infinite 
sequence passes all effectively testable laws of randomness with probability one. 

The main question remaining is the following. Let A be the uniform 
measure. Can we formulate a universal sequential A-test in terms 
of complexity? In Theorem |l| the universal (nonsequential) test is 
expressed that way. The most obvious candidate for the universal 
sequential test would be f{uj) — sup„gjy{n — C(aJi:„)}, but it is 
improper. To see this, it is simplest to notice that /(w) would declare 
all infinite w to be nonrandom since /(w) = cx), for all w, because of 
the complexity oscillations we have discussed above. The same would 
be the case for /(w) = sup„gjy^{n — C{uJi;n\n)}, by about the same 
proof. It is difficult to express a universal sequential test precisely in 
terms of C-complexity. Yet it is easy to separate the random infinite 
sequences from the nonrandom ones in terms of 'prefix' complexity, 
see below. 

The idea that random infinite sequences are those sequences such that the com- 
plexity of the initial n-length segments is at most a fixed additive constant below 
n, for all n, is one of the first-rate ideas in the area of Kolmogorov complexity. 
In fact, this was one of the motivations for Kolmogorov to invent Kolmogorov 



complexity in the first place. We have seen in Section 4.4 that this does not 
work for the plain Kolmogorov complexity C(-), due to the complexity oscilla- 
tions. The next result is important, and is a culmination of the theory. For 
prefix complexity K{-) it is indeed the case that random sequences are those 
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sequences for which the complexity of each initial segment is at least its length. 

Theorem 4 An infinite binary sequence uj is random with respect to the uni- 
form measure iff there is a constant c such that K{uJi;n) ^ n — c, for all n. 

Although the following discussion is a bit beyond the scope of this article, it 
is important to understand the raised issues. There are different 'families' of 
tests which characterize precisely the same class of random infinite sequences. 
The sequential tests are but one type. The test in Theorem ^ is an example 
of an integral test (a universal one) with respect to the uniform measure as 
defined. The introduction of different families of tests like martingale tests and 
integral tests requires special machinery. It is of the utmost importance to 
realize that an infinite sequence is random only in the sense of being random 
with respect to a given distribution. For example, if we have the sample space 
{0, 1}°° and a probability measure fi such that has /^-measure one for all 
sequences x = 00 ... and measure zero for all Ty with y ^ 00 ... 0, then the 
only /i-random infinite sequence is w = 00 . . .. 

Thus, in many applications we are not really interested in randomness with 
respect to the uniform distribution, but in randomness with respect to a given 
recursive distribution fi. It is therefore important to have explicit expressions 
for /i-randomness. In particular, we have 

Corollary 1 Let /i be a recursive measure. The function 
Po{uj\fj.) = sup {-K{x\fi) - logfi{x)} 

is a universal integral fi-test. 

Example 6 With respect to the special case of the uniform distribution A, 
Theorem || sets po(w|A) = sup„gj\/{n — K{u}i:n)} up to a constant additional 
term. This is the expression we found already in Theorem ^. O 

Such theories of exact expressions for tests which separate the random infi- 
nite sequences from the nonrandom ones with respect to any recursive measure 
are presented in ||lo|| , together with applications in inductive reasoning (the min- 
imum description length inference in statistics) and physics and computation. In 
the latter, one expresses a variation of coarse-grained Boltzmann entropy of an 
individual micro state of the system in terms of a generalization of Kolmogorov 
complexity called 'algorithmic entropy' rather than conventional coarse-grained 
Boltzmann entropy in statistical mechanics which expresses the uncertainty of 
the micro state when it is constrained only by the macro state comprising an 
ensemble of micro states. 
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Example 7 It is impossible to construct an infinite random sequence by al- 
gorithmic means. But, using the reference prefix machine U, we can define a 
particular random infinite binary sequence in a more or less natural way. The 
halting probability is the real number Q defined by 

;7(j9)<oo 

the sum taken over all inputs p for which the reference machine U halts. Since 
U halts for some p, we have > 0. Because J7 is a prefix machine, the set of its 
programs forms a prefix-code and by Kraft's Inequality, see [0, we find il < 1. 
Actually, < 1 since U does not always halt. 

We call ft the halting probability because it is the probability that U halts 
if its program is provided by a sequence of fair coin fiips. G.J. Chaitin observed 
that the number f2 has interesting properties. In the first place, the binary 
representation of the real number f2 encodes the halting problem very compactly. 
Denote the initial n-length segment of $7 after the decimal dot by fli-.n- If ^ 
were a terminating binary rational number, then we use the representation with 
infinitely many zeros, so that < + 2^". 

Claim 1 Let p be a binary string of length at most n. Given rii.„, it is decidable 
whether the reference prefix machine U halts on input p. 

Proof. Clearly, 

f]i,„ < < f)i:„ -I- 2-". (4) 

Dovetail the computations of U on all inputs as follows. The first phase consists 
of U executing one step of the computation on the first input. In the second 
phase, U executes the second step of the computation on the first input and 
the first step of the computation of the second input. Phase i consists of U 
executing the jth step of the computation on the fcth input, for all j and k such 
that j + k — i. We start with an approximation fi' 0. Execute phases 1,2, 
.... Whenever any computation of U on some input p terminates, we improve 
our approximation of O by executing 

n' :=f7' + 2-'(P). 

This process eventually yields an approximation fl' of fi, such that fl' > fli-n- 
If p is not among the halted programs which contributed to f2', then p will 
never halt. With a new p halting we add a contribution of 2"''^^^ > 2^" to the 
approximation of fl, contradicting Equation ^by 

n > r2' + 2-'(p) > f7i:„ + 2^". 

□ 

It follows that the binary expansion of 17 is an incompressible sequence. 
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Claim 2 There is a constant c such that K(ni;n) > n — c for aU n. 

That is, f2 is a particular random real , and one that is naturally 
defined to boot. That is random implies that it is not computable, 
and therefore transcendental. Namely, if it were computable, then 
K{Qi;n\n) = 0(1), which contradicts Claim ^. By the way, irra- 
tionality of fl implies that both inequalities in Equation^ are strict. 

Proof. From Claim |l] it follows that, given ^li-.n, one can calculate all 
programs p of length not greater than n for which the reference prefix machine 
U halts. For each x which is not computed by any of these halting programs 
the shortest program x* has size greater than n, that is, K(x) > n. Hence, we 
can construct a recursive function (j) computing such high complexity x's from 
initial segments of fl, such that for all n, 

K{c^{n^,^)) > n. 

Given a description of in c bits, for each n we can compute (/)(f2i:„) from Qi;n-, 
which means 

K{ni.,n) +c>n, 

which was what we had to prove. □ 

Corollary 2 By Theorem |[ we find that is random with respect to the 
uniform measure. 

Knowing the first 10,000 bits of H. enables us to solve the halting 
of all programs of less than 10,000 bits. This includes programs 
looking for counterexamples to Fermat's Last Theorem (presumed 
solved affirmatively at this time of writing), Goldbach's Conjecture, 
Riemann's Hypothesis, and most other famous conjectures in mathe- 
matics which can be refuted by single finite counterexamples. More- 
over, for all axiomatic mathematical theories which can be expressed 
compactly enough to be conceivably interesting to human beings, say 
in less than 10,000 bits, Oio^ooo can be used to decide for every state- 
ment in the theory whether it is true, false, or independent. Finally, 
knowledge of fli.n suffices to determine whether K{x) < n for each 
finite binary string x. Thus, SI is truly the number of Wisdom, and 
'can be known of, but not known, through human reason' [Charles 
H. Bennett and Martin Gardner, Scientific American, 241:11(1979), 
20-34]. But even if you possess ^1:10,000, you cannot use it except 
by spending time of a thoroughly unrealistic nature. (The time t(n) 
it takes to find all halting programs of length less than n from ili-n 
grows faster than any recursive function.) 

O 
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Example 8 Let us look at another example of an infinite random sequence, this 
time defined in terms of Diophantine equations. These are algebraic equations 
of the form X = 0, where X is build up from nonnegative integer variables 
and nonnegative integer constants by a finite number of additions {A + B) and 
multiplications {A x B). The best known examples are x" + j/" — z", where 
ri= 1,2,.... 

Pierre de Fermat (1601-1665) has stated that this equation has no 
solution in integers x, y, and z for n an integer greater than 2. (For 
n = 2 there exist solutions, for instance 3^+4^ = 5^.) However, 
he did not supply a proof of this assertion, often called Fermat's 
Last Theorem. After 350 years of withstanding concerted attacks 
to come up with a proof or disproof, the problem had become a 
celebrity among unsolved mathematical problems. (At this time of 
writing, October 1995, there is a serious claim that the problem has 
been settled.) Suppose we substitute all possible values for x, y, z 
with X + y + z < n, for n = 3,4, .. . . This way, we recursively 
enumerate all solutions of Fermat's equation. Hence, such a process 
will eventually give a counterexample to Fermat's conjecture if one 
exists, but the process will never yield conclusive evidence if the 
conjecture happens to be true. 

In his famous address to the International Mathematical Congress in 1900, D. 
Hilbert proposed twenty-three mathematical problems as a program to direct 
the mathematical efforts in the twentieth century. The tenth problem asks for an 
algorithm which, given an arbitrary Diophantine equation, produces either an 
integer solution for this equation or indicates that no such solution exists. After 
a great deal of preliminary work by other mathematicians, the Russian math- 
ematician Yuri V. Matijasevich finally showed that no such algorithm exists. 
But suppose we weaken the problem as follows. First, effectively enumerate all 
Diophantine equations, and consider the characteristic sequence A — A1A2 . . ., 
defined by A^ = 1 if the ith Diophantine equation is solvable, and otherwise. 
Obviously, C(Ai:„) <n + 0(1). 

There is an algorithm to decide the solvability of the first n Diophantine 
equations given about log n bits extra information. Namely, given the number 
m < n of soluble equations in the first n equations, we can recursively enumer- 
ate solutions to the first n equations in the obvious way until we have found m 
solvable equations. The remaining equations are unsolvable. (This is a partic- 
ular case of a more general Lemma due to J.M. Barzdins known as Barzdins' 
Lemma, (|lO|.) 

A.N. Kolmogorov observed that this shows that the solubility of the enumer- 
ated Diophantine equations is interdependent in some way since C(Ai:„|n) < 
log n + c, for some fixed constant c. The compressibility of the characteristic 
sequence means in fact that the solvability of Diophantine equations is highly 



35 



interdependent — it is impossible for an even moderately random sequence of 
them to be solvable and the remainder unsolvable. 

G.J. Chaitin proposed to replace the question of mere solubility by the ques- 
tion of whether there are finitely many or infinitely many different solutions. 
That is, no matter how many solutions we find for a given equation, by itself 
this can give no information on the question to be decided. It turns out that the 
set of indices of the Diophantine equations with infinitely many different solu- 
tions is not recursively enumerable. In particular, in the characteristic sequence 
each initial segment of length n has Kolmogorov complexity of about n. 

Claim 3 There is an (exponential) Diophantine equation 

A{n, Xi,X2, . . .,Xm) = 

which has only finitely many solutions xi,X2, ■ ■ ■ , Xm if the nth bit of ^l is zero 
and which has infinitely many solutions Xi,X2, - ■ ■ , Xm if the nth bit of Cl is one. 

The role of exponential Diophantine equations should be clarified. 
Yu.V. Matijasevich [Soviet Math. Dokl, 11(1970), 354-357] proved 
that every recursively enumerable set has a polynomial Diophan- 
tine representation. J. P. Jones and Yu.V. Matijasevich [J. Symbol. 
Logic, 49(1984), 818-829] proved that every recursively enumerable 
set has a singlefold exponential Diophantine representation. It is 
not known whether singlefold representation (which is important in 
our application) is always possible without exponentiation. See also 
G.J. Chaitin, Algorithmic Information Theory, Cambridge Univer- 
sity Press, 1987. 

Proof. By dovetailing the running of all programs of the reference prefix 
machine U in the obvious way, we find a recursive sequence of rational numbers 
u)i < u)2 < ■ ■ ■ such that fl = lim„_>oo i^n- The set 

R = {{n,k) : the nth bit of Uk is a one} 

is a recursively enumerable (even recursive) set. The main step is to use a 
theorem due to J. P. Jones and Yu.V. Matijasevich [J. Symbol. Logic 49(1984), 
818-829] to the effect that 'every recursively enumerable set R has a single- 
fold exponential Diophantine representation A{-,-y. That is, A{p,y) = is 
an exponential Diophantine equation, and the singlefoldedness consists in the 
property that p G i? iff there is a y such that A{p,y) = is satisfied, and, 
moreover, there is only a single such y. (Here both p and y can be multituples 
of integers, in our case p represents {n, Xi), and y represents {x2. . ■ . , Xm)- For 
technical reasons we consider as proper solutions only solutions x involving no 
negative integers.) It follows that there is an exponential Diophantine equation 
A{n, k,X2,. . ■ , Xm) = which has exactly one solution X2,. ■ ■ , Xm if the nth 
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bit of the binary expansion of ujk is a one, and it has no solution X2, ■ ■ ■ , Xm 
otherwise. Consequently, the number of different m-tuples xi,X2, ■ ■ ■ , Xm which 
are solutions to A{n, xi,X2, ■ ■ ■ , Xm) = is infinite if the nth bit of the binary 
expansion of 17 is a one, and this number is finite otherwise. □ 

O 

4.5 Randomness of Individual Sequences Resolved 

The notion of randomness of an infinite sequence in the sense of Martin-Lof, 
as posessing all effectively testable properties of randomness (one of which is 
unpredictability) , has turned out to be identical with the notion of an infinite 
sequence having the prefix Kolmogorov complexity of all finite initial segments 
of at least the length of the initial segment itself. Theorem ^. This equivalence of 
a single notion being defined by two completely different approaches is a truly 
remarkable fact. (To be precise, the so-called prefix Kolmogorov complexity 
of each initial segment of the infinite binary sequence must not decrease more 
than a fixed constant, depending only on the infinite sequence, below the length 
of that initial segment, [0.) This property sharply distinguishes the random 
infinite binary sequences from the nonrandom ones. The set of random infinite 
binary sequences has uniform measure one. That means that as the outcome 
from independent flips of a fair coin they occur with probability one. 

For finite binary sequences the distinction between randomness and non- 
randomness cannot be abrupt, but must be a matter of degree. For example, 
it would not be reasonable if one string is random but becomes nonrandom if 
we flip the first nonzero bit. In this context too it has been shown that finite 
binary sequences which are random in Martin-Lof 's sense correspond to those 
sequences which have Kolmogorov complexity at least their own length. Space 
limitations forbid a complete treatment of these matters here. Fortunately, it 
can be found elsewhere, |l0). 

5 Applications 
5.1 Prediction 

We are given an initial segment of an infinite sequence of zeros and ones. Our 
task is to predict the next element in the sequence: zero or one? The set of 
possible sequences we are dealing with constitutes the 'sample space'; in this 
case, the set of one-way infinite binary sequences. We assume some probability 
distribution /i over the sample space, where ^{x) is the probability of the initial 
segment of a sequence being x. Then the probability of the next bit being '0', 
after an initial segment x, is clearly /i(0|a;) = fi{xO) / fi{x) . This problem consti- 
tutes, perhaps, the central task of inductive inference and artificial intelligence. 
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However, the problem of induction is that in general we do not know the distri- 
bution /i, preventing us from assessing the actual probability. Hence, we have 
to use an estimate. 

Now assume that is computable. (This is not very restrictive, since any 
distribution used in statistics is computable, provided the parameters are com- 
putable.) We can use Kolmogorov complexity to give a very good estimate of /i. 
This involves the so-called 'universal distribution' M. Roughly speaking, M(a;) 
is close to 2~', where I is the length in bits of the shortest effective description 
of X. Among other things, M has the property that it assigns at least as high 
a probability to x as any computable /j, (up to a multiplicative constant factor 
depending on /i but not on x). What is particularly important to prediction is 
the following. 

Let Sn denote the /i-expectation of the square of the error we make in esti- 
mating the probability of the n-th symbol by M. Then it can be shown that the 
sum Sn is bounded by a constant. In other words, Sn converges to zero faster 
than 1 /n. Consequently, any actual (computable) distribution can be estimated 
and predicted with great accuracy using only the single universal distribution. 
This approach is due to Raymond J Solomonoff (1926 — ), |l^, |l^ and 

predates Kolmogorov's complexity invention. This approach has led to several 
developments in inductive reasoning, including the widely applied celebrated 
'minimum message length' of Chris S. Wallace (1933 — ) and 'minimum de- 
scription length (MDL)' of Jorma Rissanen (1932 — ), [Chris S. Wallace and 
David M. Boulton, An information measure for classification. Computing Jour- 
nal, 11(1968), 185-195; Jorma Rissanen, Modeling by the shortest data de- 
scription, Automatica-J.IFAC, 14(1978), 465-471; Jorma Rissanen, Stochastical 
Complexity and Statistical Inquiry, World Scientific Publishing Company, Sin- 
gapore, 1989] model selection methods in statistics, by the analysis in pO[ |. 

5.2 Godel's incompleteness result 

We say that a formal system (definitions, axioms, rules of inference) is consistent 
if no statement which can be expressed in the system can be proved to be both 
true and false in the system. A formal system is sound if only true statements 
can be proved to be true in the system. (Hence, a sound formal system is 
consistent.) 

Let a; be a finite binary string. We write 'x is random' if the shortest binary 
description of x with respect to the optimal specification method Dq has length 
at least x. A simple counting argument shows that there are random x's of each 
length. 

Fix any sound formal system F in which we can express statements like "a; 
is random". Suppose F can be described in / bits — assume, for example, that 
this is the number of bits used in the exhaustive description of F in the first 
chapter of the textbook Foundations of F. We claim that, for all but finitely 
many random strings x, the sentence 'x is random' is not provable in F. Assume 
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the contrary. Then given F, we can start to exhaustively search for a proof that 
some string of length n 3> / is random, and print it when we find such a string 
X. This procedure to print x of length n uses only logn + / bits of data, which 
is much less than n. But a; is random by the proof and the fact that F is sound. 
Hence, F is not consistent, which is a contradiction. This type of argument is 
due to Janis M. Barzdins (1937 — ) and later G.J. Chaitin, see [l0|] . 

This shows that although most strings are random, it is impossible to ef- 
fectively prove them random. In a way, this explains why the incompressibility 
method is so successful. We can argue about a 'typical' individual element, 
which is difficult or impossible by other methods. 

5.3 Lower bounds 

The secret of the successful use of descriptional complexity arguments as a 
proof technique is due to a simple fact: the overwhelming majority of strings 
have almost no computable regularities. We have called such a string 'random'. 
There is no shorter description of such a string than the literal description: it 
is incompressible. Incompressibility is a noneffective property in the sense of 
Example |5.2| . 

Traditional proofs often involve all instances of a problem in order to con- 
clude that some property holds for at least one instance. The proof would have 
proceeded simpler, if only that one instance could have been used in the first 
place. Unfortunately, that instance is hard or impossible to find, and the proof 
has to involve all the instances. In contrast, in a proof by the incompressibil- 
ity method, we first choose a random (that is, incompressible) individual object 
that is known to exist (even though we cannot construct it). Then we show that 
if the assumed property would not hold, then this object could be compressed, 
and hence it would not be random. Let us give a simple example appearing in 
[ pi] [Tot . A proof using the probabilistic method appears in [Paul Erdos and Joel 
Spencer, Probabilistic Methods in Combinatorics, Academic Press, New York, 
1974]. 

A tournament is defined to be a complete directed graph. That is, for each 
pair of nodes i and j, exactly one of edges {i,j) or (j, i) is in T. The nodes of a 
tournament can be viewed as players in a game tournament. If (i, j) is in T, we 
say player j dominates player i. We call T transitive if («, j), (j, k) in T implies 
{i,k) in T. 

Let F = r„ be the set of all tournaments on N = {l,...,n}. Given a 
tournament T € F, fix a standard encoding E : T ^ {0, l}"("~i)/2^ one bit for 
each edge. The bit for edge {i,j) is set to 1 if i < j and otherwise. There is a 
1-1 correspondence between the members of F and the binary strings of length 
n{n~ l)/2. 

Let v{n) be the largest integer such that every tournament on N contains a 
transitive subtournament on v{n) nodes. 
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Theorem 5 v{n) < 1 + [21ognJ . 

Proof. Fix T eV such that 

CiEiT)\n,p)> n{n ~l)/2, (5) 

where p is a fixed program that on input n and E'{T) (below) outputs E{T). 
Let S be the transitive subtournamcnt of T on v{n) nodes. We try to compress 
E{T), to an encoding E'{T), as follows. 

1. Prefix the list of nodes in S in order of dominance to E{T), each node 
using [logri] bits, adding v{n)\\ogn] bits. 

2. Delete all redundant bits from the E{T) part, representing the edges be- 
tween nodes in S, saving v{n){v{n) — l)/2 bits. 

Then, 

1{E\T)) = l{EiT)) - ^iv{n) - 1 - 2riognl). (6) 
Given n, the program p reconstructs E{T) from E'{T). Therefore, 

C{E{T)\n,p) < 1{E\T)). (7) 

Equations ||, ^, and |^ can only be satisfied with v{n) < 1 + [2 \ogn\ . □ The 

general idea used in the incompressibility proof of Theorem |^ is the following. If 
each tournament contains a large transitive subtournamcnt, or any other 'regu- 
lar' property for that matter, then also a tournament T of maximal complexity 
contains one. But the regularity induced by a too large transitive subtourna- 
ment can be used to compress the description of T to below its complexity, 
leading to the required contradiction. 

Results using the incompressibility method can (perhaps) always be rewrit- 
ten using the probabilistic method or counting arguments. The incompressibility 
argument seems simpler and more intuitive. It is easy to generalize the above 
arguments from proving 'existence' to proving 'almost all'. Almost all strings 
have high complexity. Therefore, almost all tournaments and almost all undi- 
rected graphs have high complexity. Any combinatorial property proven about 
an arbitrary complex object in such a class will hold for almost all objects in 
the class. That is, such properties are subject to a Kolmogorov complexity 
— 1 Law: they either hold by a Kolmogorov complexity argument for almost 
all objects in the class or for no objects in the class. For example, the proof 
of Theorem ^ can trivially be strengthened as below. By simply counting the 
number of binary programs of length at most the righthand side of Equation ^, 
there are at least 2"("~^)/^(l — 1/n) tournaments T on n nodes with 

C(£:(r)|n,p) > n(n- l)/2-logn. (8) 

This is a (1 — l/n)ih fraction of all tournaments on n nodes. Using Equation || 
in the proof yields the statement below. 
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For almost all tournaments on n nodes (at least a (1 — l/n)th frac- 
tion) , the largest transitive subtournament has at most 1 + 2 [2 log n] 
nodes, from some n onwards. 

In [ pO| , Chapter 6, the incompressibility method is explained in more detail. 
Its utility is demonstrated in a variety of examples of proving mathematical 
and computational results. These include questions concerning the average case 
analysis of algorithms (such as Hcapsort), sequence analysis, average case com- 
plexity in general, formal languages, combinatorics, time and space complexity 
analysis of various sequential or parallel machine models, language recognition, 
and string matching. Other applications include the use of resource-bounded 
Kolmogorov complexity in the analysis of computational complexity classes, the 
universal optimal search algorithm, and 'logical depth'. 



5.4 Statistical Properties of Finite Sequences 

Each individual infinite sequence generated by a (5, ^) Bernoulli process (flip- 
ping a fair coin) has (with probability 1) the property that the relative fre- 
quency of zeros in an initial n-length segment goes to ^ for n goes to infinity. 
Such randomness related statistical properties of individual (high) complexity 
finite binary sequences are often required in applications of incompressibility 
arguments. The situation for infinite random sequences is studied above. 



We know from Section i.4 that each infinite sequence which is random with 
respect to the uniform measure satisfies all effectively testable properties of 
randomness: it is normal, it satisfies the so-called Law of the Iterated Logarithm, 
the number of I's minus the number of O's in an initial n-length segment is 
positive for infinitely many n and negative for another infinitely many n, and so 
on. While the statistical properties of infinite sequences are simple corollaries 
of the theory of Martin-L6f randomness, for finite sequences the situation is less 
simple. 

In the finite case, randomness is a matter of degree, because it would be 
clearly unreasonable to say that a sequence x of length n is random and to say 
that a sequence y obtained by fiipping the first '1' bit of x is nonrandom. What 
we can do is to express the degree of incompressibility of a finite sequence in the 
form of its Kolmogorov complexity, and then analyze the statistical properties 
of the sequence — for example, the number of O's and I's in it, as in jl^, [T^ . 

Since almost all finite sequences have about maximal Kolmogorov complex- 
ity, each statistical property which is possessed by each individual maximal 
complexity sequence must also hold approximately in the expecteded sense (on 
average) for the overall set. Let us look at the converse. The fact that some 
property holds on the average over the members of a set does not in general 
imply that they are present in most individual members. For example, if the 
set is {00 ... 0, 11 ... 1} where both sequences have length m, then the average 
relative frequency of I's over initial segments of length n is 1/2 for each n with 
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1 < n < m. But for each individual member of this set the relative frequency 
of I's in the initial segments is either zero or one. 

In contrast, for infinite sequences all effectively testable properties of ran- 
domness hold with probability one and therefore are also expected or average 
properties. We have seen that these properties must also hold with certainty for 
each individual sequence with high enough Kolmogorov complexity, and that 
these sequences have probability one in the set of all sequences. That is, the 
Kolmogorov random elements in given set with uniform probability distribution 
possess individually each effectively testable property which is shared by almost 
all elements of the set. We cannot expect that this phenomena also holds with 
this sharpness in for the finite sequences, if only because we cannot devide the 
set of finite sequences sharply into sequences which are random and sequences 
which are not random (unlike the infinite sequences). However, it turns out 
that randomness properties hold approximately for finite sequences with high 
enough Kolmogorov complexity. 

For example, we can a priori state that each high-complexity finite binary 
sequence is 'normal' in the sense that each binary block of length k occurs 
about equally frequent for k relatively small. In particular, this holds for k = 1. 
However, in many applications we need to know exactly what 'about' and the 
'relatively small' in this statement mean. In other words, we are interested in 
the extent to which 'Borel normality' holds in relation with the complexity of a 
finite sequence. 

Let X have length n. It can be shown (and follows from the stronger result 
below) that if C{x\n) = n + 0(1), then the number of zeros it contains is 

'^ + o{M- 

Notation 1 The quantity K{x\y) in this section satisfies 

C{x\y) < K{x\y) < C{x\y) + 21ogC7(a;|y) + 1. 

This is the length of a self-delimiting version of a program p of length l{p) = 
C{x\y), what we defined as 'prefix complexity'. 

Definition 8 The class of deficiency functions is the set of functions 5 : M 
A/" satisfying iir(n, (5(n)|n — (5(n)) < c\ for all n. (Hence, C{n,5{n)\n — 5{n)) < c\ 
for all n.) 

This way we can retrieve n and 5{n) from n — 5{n) by a self-delimiting 
program of at most Ci bits. We choose Ci so large that each monotone sublinear 
recursive function that we are interested in, such as logn, ^Jn, log log n, is 
such a deficiency function. The constant c\ is a benchmark which stays fixed 
throughout this section. 
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Lemma 1 There is a constant c, such that for all deficiency functions 6, for 
each n and x G {0, 1}", if C{x) > n — S(n), then 



#onea{x) - ^ < V(<5(n) + c)nln2. (9) 

Proof. A general estimate of the tail probability of the binomial distribu- 
tion, with Sn the number of successful outcomes in n experiments with prob- 
ability of success < p < 1 and g = 1 — p, is given by Chernoff 's bounds, 

m 

Pr(|s„ - np\ > m) < 26-"'" /^""P" . (10) 

Let Sn be the number of I's in the outcome of n fair coin flips, which means 
that p ^ q ^ 1/2. Defining A ^ {x e {0,1}" : |#ones(a;) - n/2\ > m} and 
applying Equation 

d{A) < 2»+ie-™'/". 

Let m = •\/ {S{n) + c)n In 2 where c is a constant to be determined later. We 
can compress any x £ A in the following way. 

1. Let s be a self-delimiting program to retrieve n and d{n) from n — 6{n), 
of length at most ci . 

2. Given n and S{n), we can effectively enumerate A. Let i be the index of x 
in such an effective enumeration of A. The length of the (not necessarily 
self-delimiting) description of i satisfies 



l{t)<\ogd{A) = n+l-Floge-™'/" 
< n + I — S{n) — c. 

The string si is padded to length n + 1 — 5{n) — c + Ci. From si we can 
reconstruct x by first using l(si) to compute n — 6{n), then compute n and S{n) 
from s and n — S{n), and subsequently enumerate A to obtain the ith element. 
Let T be the Turing machine embodying the procedure for reconstructing x. 
Then, by definition of C(-), 

C{x) < Ct{x) +ct <n + l- S{n) -c + ci+ct- 

Choosing c = 1 + ci + ct we find C{x) < n ~ 5{n), which contradicts the 
condition of the theorem. Hence, |#ones(x) — n/2| < m. □ 

It may be surprising at first glance, but there are no maximally complex 
sequences with about equal number of zeros and ones. Equal numbers of zeros 
and ones is a form of regularity, and therefore lack of complexity. That is, 
for X £ {0,1}"', if |#ones(a;) — n/2\ = 0(1), then the randomness deficiency 
6{n) = n — C{x) is nonconstant (order logn). 
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The analysis up till now has been about the statistics of O's and I's. But in a 
normal infinite binary sequence according to Definition ^ each block of length k 
occurs with limiting frequency of 2~'^. That is, blocks 00, 01, 10, and 11 should 
occur about equally often, and so on. Finite sequences will generally not be 
exactly normal, but normality will be a matter of degree. We investigate the 
block statistics for finite binary sequences. 

Definition 9 Let x ^ xi . . . Xn he a binary string of length n, and y a much 
smaller string of length /. Let p — 2~' and i^y{x) be the number of (possibly 
overlapping) distinct occurrences of y in x. For convenience, we assume that x 
'wraps around' so that an occurrence of y starting at the end of x and continuing 
at the start also counts. 

Theorem 6 Assume the notation of Definition^ with I < logn. There is a 
constant c, such that for all n and x g {0, 1}", if C{x) > n — d{n), then 

\#y{x) - np\ < y/anp, 

with a = [K{y\n) + logZ + S{n) + c](l - p)lA\n2. 

It is known from probability theory that in a randomly generated finite 
sequence the expectation of the length of the longest run of zeros or ones is pretty 
high. For each individual finite sequence with high Kolmogorov complexity we 
are certain that it contains each block (say, a run of zeros) up to a certain length. 

Theorem 7 Let x of length n satisfy C{x) > n — S{n). Then, for sufficiently 
large n, x contains all blocks y of length 

I — logn — log log n — \og{S{n) + \ogn) — 0(1). 

Corollary 3 If 6{n) — 0(log n) , then each block of length log n — 2 log log n — 
0{l) is contained in x. 

Analyzing the proof of Theorem |^ we can improve this when K{y\n) is low. 

Corollary 4 If S{n) — O(loglogn), then for each e > and n large enough, 
X contains an all-zero run y (for which K{y\n) — O(log^)) of length I = logn — 
(1 + e)loglogn + 0(1). Since at least a fraction of 1 — 1/logn of all strings 
of length n has such a S{n), the result gives the expected value of the length 
of the longest all-zero run. This is almost a log log rt additional term better 
than the expectation reported using simple probabilistic methods in [Thomas 
H. Gormen, Charles E. Leiserson, Ronald L. Rivest, Introduction to Algorithms, 
MIT Press, Cambridge, Mass., 1990]. 
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5.5 Chaos and Predictability 

Given sufficient information about a physical system, like the positions, masses 
and velocities of all particles, and a sufficiently powerful computer with enough 
memory and computation time, it should be possible in principle to compute all 
of the past and all of the future of the system. This view, eloquently propagated 
by P.S. Laplace, can be espoused both in classical mechanics and quantum me- 
chanics. In classical mechanics one would talk about a single 'history', while in 
quantum mechanics one would talk about probability distributions over an en- 
semble of 'possible histories.' Nonetheless, in practice it is impossible to obtain 
all parameters precisely. The finitary nature of measurement and computation 
requires truncation of real valued parameters; there are measuring errors; and 
according to basic quantum mechanics it is impossible to measure certain types 
of parameters simultaneously and precisely. Altogether, it is fundamental that 
there are minute uncertainties in our knowledge of any physical system at any 
time. 

This effect can been combined with the consistent tradition that small causes 
can have large effects exemplified by the metaphor "a butterfly moving its wing 
in tropical Africa can eventually cause a cyclone in the Caribbean." Minute 
perturbations in initial conditions can cause, mediated by strictly computable 
functions, arbitrary large deviations in outcome. In the mathematics of non- 
linear deterministic systems this phenomenon has been described by the catch 
term 'chaos'. 

The unpredictability of this phenomenon is sometimes explained through 
Kolmogorov complexity. Let us look at an example where unpredictability is 
immediate (without using Kolmogorov complexity). 

The following mathematical conditions are satisfied by classic thermody- 
namic systems. Each point in a state space X describes a possible micro state 
of the system. Starting from each position, the point representing the system 
describes a trajectory in X. We take time to be discrete and let t range over 
the integers. 

If the evolution of the system in time can be described by a transformation 
group [/* of X such that U^uj is the trajectory starting from some point oj at 
time 0, then we call the system dynamical. Assume that f7*a; is computable as 
a function of t and oj for both positive and negative t. 

Definition 10 Assume that the involved measure n is recursive and satisfies 
IJ,{U*V) = i^{V) for all t and all measurable sets V (like the T^'s), as in Liouville's 
Theorem. The /x-measure of a volume V of points in state space is invariant over 
time. A physical system X consists of the space X, n-cells Fj,, transformation 
group J7*, and the recursive and volume invariant measure [i. 

Consider a system X with the orbit of the system the sequence of subsequent 
micro states w, i/w, C/^w, . . .. 
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We cannot measure states which are (or involve) real numbers precisely. Be- 
cause of the finite precision of our measuring instruments, the number of possible 
observed states is finite. Hence, we can as well assume that the observable state 
space S of our system contains only a finite number of states, and constitutes 
a finite partition of X. Then, each micro state in X is observed as being an 
element of some observable state in S. 

For convenience we assume that S ~ {ro,ri} is the set of observed states. 
Here Fq is the set of micro states starting with a '0', and Fi is the set of micro 
states starting with a '1'. 

Let F" be the observed state of X at time n. Given an initial observed 
segment F°, . . . , F"^^ of an observed orbit we want to compute the next observ- 
able state F". Even if it would not be possible to compute it, we would like to 
compute a prediction of it which does better than a random coin fiip. 

A well-known example of a chaotic system is the the doubling map which 
results from the so-called baker's map by deleting all bits with non-positive 
indexes from each state. That is, each micro state w is a one-way infinite binary 
sequence. Considering this as the binary expansion of a real number after the 
binary point, the set of micro states is the real interval [0, 1). The system evolves 
according to the transformation 

ujn+i = 2ujn (mod 1) (11) 

where 'mod 1' means drop the integer part. All iterates of Equation O lie in 
the unit interval [0, 1). The observable states are Fq = [0, ^) and Fi = 1). 

The doubling map is related to the discrete logistic equation 

r„+i = ar„(i-r„) 

which maps the unit interval upon itself when < o; < 4. When 
a — 4, setting y„ = sin27rX„, we get precisely 

Xn+i = 2Xn (mod 1). 

Assuming that the initial state is randomly drawn from [0,1) according to the 
uniform measure A, we can use complexity arguments to show that the doubling 
map's observable orbit cannot be predicted better than a coin toss [Joseph Ford, 
'How random is a random coin toss?'. Physics Today, 36(1983), April issue, pp. 
40-47]. 

Namely, with A-probability 1 the drawn initial state will be a Martin-L6f 
random infinite sequence since we have shown that they have uniform measure 
one in the set of infinite sequences. Such random sequences by definition cannot 
be effectively predicted better than a random coin toss, []l0| . 

But in this case we do not need to go to such trouble. The observed orbit 
essentially consists of the consecutive bits of the initial state. Selecting that 
initial state randomly from the uniform measure is isomorphic to flipping a fair 
coin to generate it. 
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